Input string without knowing the size - c

What's the way when i want to store string that i don't know the size.
I do like this:
#include <stdio.h>
#include <conio.h>
int main () {
char * str;
str = (char *)malloc(sizeof(char) + 1);
str[1] = '\0';
int i = 0;
int c = '\0';
do {
c = getche();
if(c != '\r'){
str[i] = c;
str[i + 1] = '\0';
i++;
str = (char *)realloc(str, sizeof(char) + i + 2);
}
} while(c != '\r');
printf("\n%s\n", str);
free(str);
return 0;
}
I find this page:
Dynamically prompt for string without knowing string size
Is it correct? If it is, then:
Is there any better way?
Is there more efficient way?

Is it correct?
No
The main problem is the use of realloc. It is just wrong. When using realloc never directly assign to the pointer that points to the already allocated memory - always use a temporary to take the return value. Like:
char * temp;
temp = realloc(str, 1 + i + 2);
if (temp == NULL)
{
// out of memory
.. add error handling
}
str = temp;
The reason for this is that realloc may fail in which case it will return NULL. So if you assign directly to str and realloc fails, you have lost the pointer to the allocated memory (aka the string).
Besides that:
1) Don't cast malloc and realloc
2) sizeof(char) is always 1 - so you don't need to use it - just put 1
Is there any better way?
Is there more efficient way?
Instead of reallocating by 1 in each loop - which is pretty expensive performance wise - it is in many cases better to (re)allocate a bigger chunk.
One strategy is to double the allocation whenever calling realloc. So if you have allocated 128 bytes the next allocation should be 2*128=256. Another strategy is to let it grow with some fixed size which is significantly bigger than 1 - for instance you could let it grow with 1024 each time.

I suggest using a buffer to avoid repeated realloc calls. Create a buffer or arbitary size e.g. 1024 when it fills up you can realloc more space to your dynamically allocated buffer and memmove the buffer into it.

The key to answering this question is to clarify the term "without knowing the size".
We may not know what amount of data we're going to get, but we may know what we're going to do with it.
Let us consider the following use cases:
We have restrictions on the data we need, for example: a person's name, an address, the title of a book. I guess we are good with 1k or a maximum of 16k of space.
We obtain a continuous flow of data, for example: some sensor or other equipment sends us data every second. In this case, we could process the data in chunks.
Answer:
We need to make an educated guess about the size we intend to process and allocate space accordingly.
We have to process data on the fly and we need to release the space that is no longer required.
Note:
It is important to note, that we can't allocate unlimited size of memory. On some point we have to implement error handling and/or we need to store the data on 'disk' or somewhere else.
Note II:
In case a more memory efficient solution is needed, using realloc is not recommended as it can duplicate the allocated size (if the system cannot simply increase the allocated space, it first allocates a new block of memory and copies the current contents) while running. Instead, an application-specific memory structure would be required. But I assume that is beyond the scope of the original question.

Is it correct?
Sort of.
We don't cast the result of malloc() in C.
Is there any better way?
That's primarily opinion-based.
Is there more efficient way?
With regards to time or space?
If you are asking about space, no.
If you are asking about time, yes.
You could dynamically allocate memory for an array with a small size, that would hold the string for a while. Then, when the array would not be able to hold the string any longer, you would reallocate that memory and double its size. And so on, until the whole string is read. When you are done, you could reallocate the memory again, and shrink the size to be the exact number you need for your string.
You see, calling realloc(), is costly in time, since it may have to move a whole memory block, since the memory must be contiguous, and there might not be any space left to perform that operation without moving the memory related to the string.
Note: Of course, a fixed sized array, statically created would be better in terms of time, but worse in terms of memory. Everything is a trade off, that's where you come into play and decide what best suits your application.

How about this?
char *string_name;
asprintf(&string_name, "Hello World, my name is %s & I'm %d years old!", "James Bond", 27);
printf("string is %s", string_name);
free(string_name);

Related

Using realloc to reduce the size of a memory block

A little more than 20 years ago I had some grasp of writing something small in C , but even at that time, I probably didn't really do things right all the time. Now I'm trying to learn C again, so I'm really a newbie.
Based on this article:
Using realloc to shrink the allocated memory
, I made this test, which works, but troubles me:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int test (char *param) {
char *s = malloc(strlen(param));
strcpy(s, param);
printf("original string : [%4d] %s \n", strlen(s), s);
// reduce size
char *tmp = realloc(s, 5);
if (tmp == NULL) {
printf("Failed\n");
free(s);
exit(1);
} else {
tmp[4] = 0;
}
s = tmp;
printf("the reduced string : [%4d] %s\n", strlen(s), s );
free(s);
}
void main(void){
test("This is a string with a certain length!");
}
If I leave out "tmp[4] = 0", then I still get back the whole string. Does this mean the rest of the string is still in memory, but not allocated anymore?
how does c free memory anyway? Does it keep track of memory by itself or is it something that is handled by the OS?
I free the s string "free(s)", do I also need to free the tmp str (it does point to the same memory block, yet the (same) address it holds is probably stored on another memory location?
These are most likely just basics, but none of what I have read so far has given me a clear answer (including mentioned article).
If I leave out "tmp[4] = 0", then I still get back the whole string.
You've invoked undefined behavior. All the string operations require the argument to be a null-terminated array of characters. If you reduce the size of the allocation so it doesn't include the null terminator, you're accessing outside the allocation when it tries to find it.
Does this mean the rest of the string is still in memory, but not allocated anymore?
In practice, many implementations don't actually re-allocate anything when you shrink the size. They simply update the bookkeeping information to say that the allocated length is shorter, and return the original pointer. So the remainder of the string stays the same unless you do another allocation that happens to use that memory.
This can even happen when you grow the size. Some designs always allocate memory in specific granularities (e.g. powers of 2), so if you grow the allocation but it doesn't exceed the granularity, it doesn't need to copy the data.
how does c free memory anyway? Does it keep track of memory by itself or is it something that is handled by the OS?
Heap management is part of the C runtime library. It can use a variety of strategies.
I free the s string "free(s)", do I also need to free the tmp str (it does point to the same memory block, yet the (same) address it holds is probably stored on another memory location?
After s = tmp;, both s and tmp point to the same allocated memory block. You only need to free one of them.
BTW, the initial allocation should be:
char *s = malloc(strlen(param)+1);
You need to add 1 for the null terminator, since strlen() doesn't count this.

Can we insert element at the end of an array without copying all it's content?

I'm trying to reproduce the behavior of a std::vector in C, and I have something on my mind: a vector has a contiguously allocated memory, which means that when you add an element at the end it does not copy all the array to another block of memory but instead allocates just another element at the end, which increases performance, but in C I can't figure out a way to reproduce that behavior, the only way I can find to add an element at the end is by doing this:
void *new_array;
size_t new_size = old_size + size_type;
new_array = (char*)malloc(size_type * new_size);
memcpy(new_array, old_array, old_size * size_type);
memcpy(new_array, new_value, size_type);
and I'm pretty sure that a std::vector doesn't proceed like that, is it possible to reproduce it without allocating a big block of memory before ?
What you might be looking for is realloc from <stdlib.h>. It doesn't necessarily copy the entire data, because memory isn't necessarily stored contiguously depending on the implementation and as William Pursell mentioned in the comments of another answer:
Quite often, realloc can grow a chunk of memory without needing to do a copy.
I have provided a simple example of how to use realloc below.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
/* last character will be `\0` */
char *str = malloc(6);
strcpy(str, "Hello");
str = realloc(str, 7);
str[5] = '!';
str[6] = '\0';
printf("%s\n", str);
free(str);
return EXIT_SUCCESS;
}
From what I have seen, a std::vector also goes through reallocation when its capacity is exhausted. In that case, a new vector with twice the capacity as that of the earlier one is created, followed by copy from the old into the new. As a result of this reallocation, iterators and references to the earlier vector is invalidated. Because of this very reason, std::vector is considered 'unstable'. The contra of this is stable_vector in Boost (https://www.boost.org/doc/libs/1_75_0/doc/html/container/non_standard_containers.html#container.non_standard_containers.stable_vector)
std::vector's reallocation often has performance penalties. As a result, its often desirable to do a reserve on a vector - at the time of its initialization - if prior knowledge exists about what could be the final size of the vector.
You can not add an element to an array without reallocating the memory occupied by the array provided that the current array does not have already a space for a new element.
So any reallocation of an array means in general copying its stored elements in a new extent of memory (except cases when the current extent of memory is just enlarged internally).
You can do this "manually" as it is done in C++ or in C you can use the standard function realloc that will do such copying itself if required.

how to allocate memory to store 5 names without wasting not even 1 byte

I want to store 5 names without wasting 1byte , so how can allocate memory using malloc
That's for all practical purposes impossible, malloc will more often than not return blocks of memory bigger than requested.
#include <stdio.h>
#include<stdlib.h>
int main()
{
int n,i,c;
char *p[5];/*declare a pointer to 5 strings for the 5 names*/
for(i=0;i<5;i++)
{
n=0;
printf("please enter the name\n" );/*input name from the user*/
while((c=getchar())!='\n')
n++;/*count the total number of characters in the name*/
p[i]= (char *)malloc(sizeof(char)*n);/*allocate the required amount of memory for a name*/
scanf("%s",p[i]);
}
return 0;
}
If you know the cumulative length of the five names, let's call it length_names, you could do a
void *pNameBlock = malloc(length_names + 5);
Then you could store the names, null terminated (the +5 is for the null termination), one right after the other in the memory pointed to by pNameBlock.
char *pName1 = (char *) pNameBlock;
Store the name data at *pName1. Maybe via
char *p = *pName1; You can then write byte by byte (following is pseudo-codeish).
*p++ = byte1;
*p++ = byte2;
etc.
End with a null termination:
*p++ = '\0';
Now set
char *pName2 = p;
and write the second name using p, as above.
Doing things this way will still waste some memory. Malloc will internally get itself more memory than you are asking for, but it will waste that memory only once, on this one operation, getting this one block, with no overhead beyond this once.
Be very careful, though, because under this way of doing things, you can't free() the char *s, such as pName1, for the names. You can only free that one pointer you got that one time, pNameBlock.
If you are asking this question out of interest, ok. But if you are this memory constrained, you're going to have a very very hard time. malloc does waste some memory, but not a lot. You're going to have a hard time working with C this constrained. You'd almost have to write your own super light weight memory manager (do you really want to do that?). Otherwise, you'd be better off working in assembly, if you can't afford to waste even a byte.
I have a hard time imagining what kind of super-cramped embedded system imposes this kind of limit on memory usage.
If you don't want to waste any byte to store names, you should dynamically allocate a double array (char) in C.
A double array in C can be implemented as a pointer to a list of pointers.
char **name; // Allocate space for a pointer, pointing to a pointer (the beginning of an array in C)
name = (char **) malloc (sizeof(char *) * 5); // Allocate space for the pointer array, for 5 names
name[0] = (char *) malloc (sizeof(char) * lengthOfName1); // Allocate space for the first name, same for other names
name[1] = (char *) malloc (sizeof(char) * lengthOfName2);
....
Now you can save the name to its corresponding position in the array without allocating more space, even though names might have different lengths.
You have to take double pointer concept and then have to put your name character by character with increment of pointer address and then you are able to save all 5 names so as you are able to save your memory.
But as programmer you should not have to use this type of tedious task you have to take array of pointers to store names and have to allocate memory step by step.
This is only for the concept of storing names but if you are dealing with large amount of data then you have to use link list to store all data.
When you malloc a block, it actually allocates a bit more memory than you asked for. This extra memory is used to store information such as the size of the allocated block.
Encode the names in binary and store them in a byte array.
What is "memory waste"? If you can define it clearly, then a solution can be found.
For example, the null in a null terminated string might be considered "wasted memory" because the null isn't printed; however, another person might not consider it memory waste because without it, you need to store a second item (string length).
When I use a byte, the byte is fully used. Only if you can show me how it might be done without that byte will I consider your claims of memory waste valid. I use the nulls at the ends of my strings. If I declare an array of strings, I use the array too. Make what you need, and then if you find that you can rearrange those items to use less memory, decide that the other way wasted some memory. Until then, you're chasing a dream which you haven't finished.
If these five "names" are assembly jump points, you don't need a full string's worth of memory to hold them. If the five "names" are block scoped variables, perhaps they won't need any more memory than the registers already provide. If they are strings, then perhaps you can combine and overlay strings; but, until you come up with a solution, and a second solution to compare the first against, you don't have a case for wasted / saved memory.

What happens to memory after '\0' in a C string?

Surprisingly simple/stupid/basic question, but I have no idea: Suppose I want to return the user of my function a C-string, whose length I do not know at the beginning of the function. I can place only an upper bound on the length at the outset, and, depending on processing, the size may shrink.
The question is, is there anything wrong with allocating enough heap space (the upper bound) and then terminating the string well short of that during processing? i.e. If I stick a '\0' into the middle of the allocated memory, does (a.) free() still work properly, and (b.) does the space after the '\0' become inconsequential? Once '\0' is added, does the memory just get returned, or is it sitting there hogging space until free() is called? Is it generally bad programming style to leave this hanging space there, in order to save some upfront programming time computing the necessary space before calling malloc?
To give this some context, let's say I want to remove consecutive duplicates, like this:
input "Hello oOOOo !!" --> output "Helo oOo !"
... and some code below showing how I'm pre-computing the size resulting from my operation, effectively performing processing twice to get the heap size right.
char* RemoveChains(const char* str)
{
if (str == NULL) {
return NULL;
}
if (strlen(str) == 0) {
char* outstr = (char*)malloc(1);
*outstr = '\0';
return outstr;
}
const char* original = str; // for reuse
char prev = *str++; // [prev][str][str+1]...
unsigned int outlen = 1; // first char auto-counted
// Determine length necessary by mimicking processing
while (*str) {
if (*str != prev) { // new char encountered
++outlen;
prev = *str; // restart chain
}
++str; // step pointer along input
}
// Declare new string to be perfect size
char* outstr = (char*)malloc(outlen + 1);
outstr[outlen] = '\0';
outstr[0] = original[0];
outlen = 1;
// Construct output
prev = *original++;
while (*original) {
if (*original != prev) {
outstr[outlen++] = *original;
prev = *original;
}
++original;
}
return outstr;
}
If I stick a '\0' into the middle of the allocated memory, does
(a.) free() still work properly, and
Yes.
(b.) does the space after the '\0' become inconsequential? Once '\0' is added, does the memory just get returned, or is it sitting there hogging space until free() is called?
Depends. Often, when you allocate large amounts of heap space, the system first allocates virtual address space - as you write to the pages some actual physical memory is assigned to back it (and that may later get swapped out to disk when your OS has virtual memory support). Famously, this distinction between wasteful allocation of virtual address space and actual physical/swap memory allows sparse arrays to be reasonably memory efficient on such OSs.
Now, the granularity of this virtual addressing and paging is in memory page sizes - that might be 4k, 8k, 16k...? Most OSs have a function you can call to find out the page size. So, if you're doing a lot of small allocations then rounding up to page sizes is wasteful, and if you have a limited address space relative to the amount of memory you really need to use then depending on virtual addressing in the way described above won't scale (for example, 4GB RAM with 32-bit addressing). On the other hand, if you have a 64-bit process running with say 32GB of RAM, and are doing relatively few such string allocations, you have an enormous amount of virtual address space to play with and the rounding up to page size won't amount to much.
But - note the difference between writing throughout the buffer then terminating it at some earlier point (in which case the once-written-to memory will have backing memory and could end up in swap) versus having a big buffer in which you only ever write to the first bit then terminate (in which case backing memory is only allocated for the used space rounded up to page size).
It's also worth pointing out that on many operating systems heap memory may not be returned to the Operating System until the process terminates: instead, the malloc/free library notifies the OS when it needs to grow the heap (e.g. using sbrk() on UNIX or VirtualAlloc() on Windows). In that sense, free() memory is free for your process to re-use, but not free for other processes to use. Some Operating Systems do optimise this - for example, using a distinct and independently releasble memory region for very large allocations.
Is it generally bad programming style to leave this hanging space there, in order to save some upfront programming time computing the necessary space before calling malloc?
Again, it depends on how many such allocations you're dealing with. If there are a great many relative to your virtual address space / RAM - you want to explicitly let the memory library know not all the originally requested memory is actually needed using realloc(), or you could even use strdup() to allocate a new block more tightly based on actual needs (then free() the original) - depending on your malloc/free library implementation that might work out better or worse, but very few applications would be significantly affected by any difference.
Sometimes your code may be in a library where you can't guess how many string instances the calling application will be managing - in such cases it's better to provide slower behaviour that never gets too bad... so lean towards shrinking the memory blocks to fit the string data (a set number of additional operations so doesn't affect big-O efficiency) rather than having an unknown proportion of the original string buffer wasted (in a pathological case - zero or one character used after arbitrarily large allocations). As a performance optimisation you might only bother returning memory if unusued space is >= the used space - tune to taste, or make it caller-configurable.
You comment on another answer:
So it comes down to judging whether the realloc will take longer, or the preprocessing size determination?
If performance is your top priority, then yes - you'd want to profile. If you're not CPU bound, then as a general rule take the "preprocessing" hit and do a right-sized allocation - there's just less fragmentation and mess. Countering that, if you have to write a special preprocessing mode for some function - that's an extra "surface" for errors and code to maintain. (This trade-off decision is commonly needed when implementing your own asprintf() from snprintf(), but there at least you can trust snprintf() to act as documented and don't personally have to maintain it).
Once '\0' is added, does the memory just get returned, or is it
sitting there hogging space until free() is called?
There's nothing magical about \0. You have to call realloc if you want to "shrink" the allocated memory. Otherwise the memory will just sit there until you call free.
If I stick a '\0' into the middle of the allocated memory, does (a.)
free() still work properly
Whatever you do in that memory free will always work properly if you pass it the exact same pointer returned by malloc. Of course if you write outside it all bets are off.
\0 is just one more character from malloc and free perspective, they don't care what data you put in the memory. So free will still work whether you add \0 in the middle or don't add \0 at all. The extra space allocated will still be there, it won't be returned back to the process as soon as you add \0 to the memory. I personally would prefer to allocate only the required amount of memory instead of allocating at some upper bound as that will just wasting the resource.
As soon as you get memory from heap by calling malloc(), the memory is yours to use. Inserting \0 is like inserting any other character. This memory will remain in your possession until you free it or until OS claims it back.
The \0is a pure convention to interpret character arrays as stings - it is independent of the memory management. I.e., if you want to get your money back, you should call realloc. The string does not care about memory (what is a source of many security problems).
malloc just allocates a chunk of memory .. Its upto you to use however you want and call free from the initial pointer position... Inserting '\0' in the middle has no consequence...
To be specific malloc doesnt know what type of memory you want (It returns onle a void pointer) ..
Let us assume you wish to allocate 10 bytes of memory starting 0x10 to 0x19 ..
char * ptr = (char *)malloc(sizeof(char) * 10);
Inserting a null at 5th position (0x14) does not free the memory 0x15 onwards...
However a free from 0x10 frees the entire chunk of 10 bytes..
free() will still work with a NUL byte in memory
the space will remain wasted until free() is called, or unless you subsequently shrink the allocation
Generally, memory is memory is memory. It doesn't care what you write into it. BUT it has a race, or if you prefer a flavor (malloc, new, VirtualAlloc, HeapAlloc, etc). This means that the party that allocates a piece of memory must also provide the means to deallocate it. If your API comes in a DLL, then it should provide a free function of some sort.
This of course puts a burden on the caller right?
So why not put the WHOLE burden on the caller?
The BEST way to deal with dynamically allocated memory is to NOT allocate it yourself. Have the caller allocate it and pass it on to you. He knows what flavor he allocated, and he is responsible to free it whenever he is done using it.
How does the caller know how much to allocate?
Like many Windows APIs have your function return the required amount of bytes when called e.g. with a NULL pointer, then do the job when provided with a non-NULL pointer (using IsBadWritePtr if it is suitable for your case to double-check accessibility).
This can also be much much more efficient. Memory allocations COST a lot. Too many memory allocations cause heap fragmentation and then the allocations cost even more. That's why in kernel mode we use the so called "look-aside lists". To minimize the number of memory allocations done, we reuse the blocks we have already allocated and "freed", using services that the NT Kernel provides to driver writers.
If you pass on the responsibility for memory allocation to your caller, then he might be passing you cheap memory from the stack (_alloca), or passing you the same memory over and over again without any additional allocations. You don't care of course, but you DO allow your caller to be in charge of optimal memory handling.
To elaborate on the use of the NULL terminator in C:
You cannot allocate a "C string" you can allocate a char array and store a string in it, but malloc and free just see it as an array of the requested length.
A C string is not a data type but a convention for using a char array where the null character '\0' is treated as the string terminator.
This is a way to pass strings around without having to pass a length value as a separate argument. Some other programming languages have explicit string types that store a length along with the character data to allow passing strings in a single parameter.
Functions that document their arguments as "C strings" are passed char arrays but have no way of knowing how big the array is without the null terminator so if it is not there things will go horribly wrong.
You will notice functions that expect char arrays that are not necessarily treated as strings will always require a buffer length parameter to be passed.
For example if you want to process char data where a zero byte is a valid value you can't use '\0' as a terminator character.
You could do what some of the MS Windows APIs do where you (the caller) pass a pointer and the size of the memory you allocated. If the size isn't enough, you're told how many bytes to allocate. If it was enough, the memory is used and the result is the number of bytes used.
Thus the decision about how to efficiently use memory is left to the caller. They can allocate a fixed 255 bytes (common when working with paths in Windows) and use the result from the function call to know whether more bytes are needed (not the case with paths due to MAX_PATH being 255 without bypassing Win32 API) or whether most of the bytes can be ignored...
The caller could also pass zero as the memory size and be told exactly how much needs to be allocated - not as efficient processing-wise, but could be more efficient space-wise.
You can certainly preallocate to an upperbound, and use all or something less.
Just make sure you actually use all or something less.
Making two passes is also fine.
You asked the right questions about the tradeoffs.
How do you decide?
Use two passes, initially, because:
1. you'll know you aren't wasting memory.
2. you're going to profile to find out where
you need to optimize for speed anyway.
3. upperbounds are hard to get right before
you've written and tested and modified and
used and updated the code in response to new
requirements for a while.
4. simplest thing that could possibly work.
You might tighten up the code a little, too.
Shorter is usually better. And the more the
code takes advantage of known truths, the more
comfortable I am that it does what it says.
char* copyWithoutDuplicateChains(const char* str)
{
if (str == NULL) return NULL;
const char* s = str;
char prev = *s; // [prev][s+1]...
unsigned int outlen = 1; // first character counted
// Determine length necessary by mimicking processing
while (*s)
{ while (*++s == prev); // skip duplicates
++outlen; // new character encountered
prev = *s; // restart chain
}
// Construct output
char* outstr = (char*)malloc(outlen);
s = str;
*outstr++ = *s; // first character copied
while (*s)
{ while (*++s == prev); // skip duplicates
*outstr++ = *s; // copy new character
}
// done
return outstr;
}

Character arrays in C

I'm new to c. Just have a question about the character arrays (or string) in c: When I want to create a character array in C, do I have to give the size at the same time?
Because we may not know the size that we actually need. For example of client-server program, if we want to declare a character array for the server program to receive a message from the client program, but we don't know the size of the message, we could do it like this:
char buffer[1000];
recv(fd,buffer, 1000, 0);
But what if the actual message is only of length 10. Will that cause a lot of wasted memory?
Yes, you have to decide the dimension in advance, even if you use malloc.
When you read from sockets, as in the example, you usually use a buffer with a reasonable size, and dispatch data in other structure as soon you consume it. In any case, 1000 bytes is not a so much memory waste and is for sure faster than asking a byte at a time from some memory manager :)
Yes, you have to give the size if you are not initializing the char array at the time of declaration. Better approach for your problem is to identify the optimum size of the buffer at run time and dynamically allocate the memory.
What you're asking about is how to dynamically size a buffer. This is done with a dynamic allocation such as using malloc() -- a memory allocator. Using it gives you an important responsibility though: when you're done using the buffer you must return it to the system yourself. If using malloc() [or calloc()], you return it with free().
For example:
char *buffer; // pointer to a buffer -- essentially an unsized array
buffer = (char *)malloc(size);
// use the buffer ...
free(buffer); // return the buffer -- do NOT use it any more!
The only problem left to solve is how to determine the size you'll need. If you're recv()'ing data that hints at the size, you'll need to break the communication into two recv() calls: first getting the minimum size all packets will have, then allocating the full buffer, then recv'ing the rest.
When you don't know the exact amount of input data, do as follows:
Create a small buffer
Allocate some memory for a "storage" (e.g. twice of buffer size)
Fill the buffer with the data from the input stream (e.g. socket, file etc.)
Copy the data from the buffer to the storage
4.1 If there is not enough place in storage, re-allocate the memory (e.g. with a size twice bigger than it is at this point)
Do steps 3 and 4 unless the "END OF STREAM"
Your storage contains the data now.
If you don't know the size a-priori, then you have no choice but to create it dynamically using malloc (or whatever equivalent mechanism in your language of choice.)
size_t buffer_size = ...; /* read from a DEFINE or from a config file */
char * buffer = malloc( sizeof( char ) * (buffer_size + 1) );
Creating a buffer of size m, but only receiving an input string of size n with n < m is not a waste of memory, but an engineering compromise.
If you create your buffer with a size close to the intended input, you risk having to refill the buffer many, many times for those cases where m >> n. Typically, iterations over the buffer are tied up with I/O operations, so now you might be saving some bytes (which is really nothing in today's hardware) at the expense of potentially increasing the problems in some other end. Specially for client-server apps. If we were talking about resource-constrained embedded systems, that'd be another thing.
You should be worrying about getting your algorithms right and solid. Then you worry, if you can, about shaving off a few bytes here and there.
For me, I'd rather create a buffer that is 2 to 10 times greater than the average input (not the smallest input as in your case, but the average), assuming my input tends to have a slow standard deviation in size. Otherwise, I'd go 20 times the size or more (specially if memory is cheap and doing this minimizes hitting the disk or the NIC card.)
At the most basic setup, one typically gets the size of the buffer as a configuration item read off a file (or passed as an argument), and defaulting to a default compile time value if none is provided. Then you can adjust the size of your buffers according to the observed input sizes.
More elaborate algorithms (say TCP) adjust the size of their buffers at run-time to better accommodate input whose size might/will change over time.
Even if you use malloc you also must define the size first! So instead you give a large number that is capable of accepting the message like:
int buffer[2000];
In case of small message or large you can reallocate it to release the unused locations or to occupy the unused locations
example:
int main()
{
char *str;
/* Initial memory allocation */
str = (char *) malloc(15);
strcpy(str, "tutorialspoint");
printf("String = %s, Address = %u\n", str, str);
/* Reallocating memory */
str = (char *) realloc(str, 25);
strcat(str, ".com");
printf("String = %s, Address = %u\n", str, str);
free(str);
return(0);
}
Note: make sure to include stdlib.h library

Resources