Proper memory allocation? - c

How would I only allocate as much memory as really needed without knowing how big the arguments to the function are?
Usually, I would use a fixed size, and calculate the rest with sizeof (note: the code isn't supposed to make sense, but to show the problem):
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
int test(const char* format, ...)
{
char* buffer;
int bufsize;
int status;
va_list arguments;
va_start(arguments, format);
bufsize = 1024; /* fixed size */
bufsize = sizeof(arguments) + sizeof(format) + 1024;
buffer = (char*)malloc(bufsize);
status = vsprintf(buffer, format, arguments);
fputs(buffer, stdout);
va_end(arguments);
return status;
}
int main()
{
const char* name = "World";
test("Hello, %s\n", name);
return 0;
}
However, I don't think this is the way to go... so, how would I calculate the required buffersize properly here?

If you have vsnprintf available to you, I would make use of that. It prevents buffer overflow since you provide the buffer size, and it returns the actual size needed.
So allocate your 1K buffer then attempt to use vsnprintf to write into that buffer, limiting the size. If the size returned was less than or equal to your buffer size, then it's worked and you can just use the buffer.
If the size returned was greater than the buffer size, then call realloc to get a bigger buffer and try it again. Provided the data hasn't changed (e.g., threading issues), the second one will work fine since you already know how big it will be.
This is relatively efficient provided you choose your default buffer size carefully. If the vast majority of your outputs are within that limit, very few reallocations has to take place (see below for a possible optimisation).
If you don't have an vsnprintf-type function, a trick we've used before is to open a file handle to /dev/null and use that for the same purpose (checking the size before outputting to a buffer). Use vfprintf to that file handle to get the size (the output goes to the bit bucket), then allocate enough space based on the return value, and vsprintf to that buffer. Again, it should be large enough since you've figured out the needed size.
An optimisation to the methods above would be to use a local buffer, rather than an allocated buffer, for the 1K chunk. This avoids having to use malloc in those situations where it's unnecessary, assuming your stack can handle it.
In other words, use something like:
int test(const char* format, ...)
{
char buff1k[1024];
char *buffer = buff1k; // default to local buffer, no malloc.
:
int need = 1 + vsnprintf (buffer, sizeof (buff1k), format, arguments);
if (need > sizeof (buff1k)) {
buffer = malloc (need);
// Now you have a big-enough buffer, vsprintf into there.
}
// Use string at buffer for whatever you want.
...
// Only free buffer if it was allocated.
if (buffer != buff1k)
free (buffer);
}

Related

C redirect fprintf into a buffer or char array

I have the following function and I am wondering if there is a way to pass string or char array instead of stdout into it so I can get the printed representation as a string.
void print_Type(Type t, FILE *f)
{
fprintf(f,"stuff ...");
}
print_Type(t, stdout);
I have already tried this:
int SIZE = 100;
char buffer[SIZE];
print_Type(t, buffer);
But this is what I am seeing:
�����
Something like this
FILE* f = fmemopen(buffer, sizeof(buffer), "w");
print_Type(t, f);
fclose(f);
The fmemopen(void *buf, size_t size, const char *mode) function opens a stream. The stream allows I/O to be performed on the string or memory buffer pointed to by buf.
Yes there is sprintf() notice the leading s rather than f.
int SIZE = 100;
char buffer[SIZE];
sprintf(buffer, "stuff %d", 10);
This function prints to a string s rather than a file f. It has exactly the same properties and parameters to fprintf() the only difference is the destination, which must be a char array (either statically allocated as an array or dynamical allocated (usually via malloc)).
Note: This function is dangerous as it does not check the length and can easily overrun the end of the buffer if you are not careful.
If you are using a later version of C (c99). A better function is snprintf this adds the extra buffer length checking.
The problem with fmemopen is that it cannot resize the buffer. fmemopen did exist in Glibc for quite some time, but it was standardized only in POSIX.1-2008. But that revision included another function that handles dynamic memory allocation: open_memstream(3):
char *buffer = NULL;
size_t size = 0;
FILE* f = open_memstream(&buffer, &size);
print_Type(t, f);
fclose(f);
buffer will now point to a null-terminated buffer, with size bytes before the extra null terminator! I.e. you didn't write null bytes, then strlen(buffer) == size.
Thus the only merit of fmemopen is that it can be used to write to a fixed location memory buffer or fixed length, whereas open_memstream should be used everywhere else where the location of the buffer does not matter.
For fmemopen there is yet another undesired feature - the writes may fail when the buffer is being flushed and not before. Since the target is in memory, there is no point in buffering the writes, so it is suggested that if you choose to use fmemopen, Linux manual page fmemopen(3) recommends disabling buffering with setbuf(f, NULL);

obstack, gets and getline

I am trying to get a line from stdin. as far as I understand, we should never use gets as said in man page of gets:
Never use gets(). Because it is impossible to tell without knowing
the data in advance how many characters gets() will read, and
because gets() will continue to store characters past the end of the
buffer, it is extremely dangerous to use. It has been used to
break computer security. Use fgets() instead.
it suggests that we can use fgets() instead. the problem with fgets() is that we don't know the size of the user input in advance and fgets() read exactly one less than size bytes from the stream as man said:
fgets() reads in at most one less than size characters from stream
and stores them into the buffer pointed to by s. Reading stops
after an EOF or a newline. If a newline is read, it is stored into
the buffer. A terminating null byte ('\0') is stored after the last
character in the buffer.
There is also another approach which is using POSIX getline() which uses realloc to update the buffer size so we can read any string with arbitrary length from input stream as man said:
Alternatively, before calling getline(), *lineptr can contain a
pointer to a malloc(3)-allocated buffer *n bytes in size. If the
buffer is not large enough to hold the line, getline() resizes it
with realloc(3), updating *lineptr and *n as necessary.
and finally there is another approach which is using obstack as libc manual said:
Aside from this one constraint of order of freeing, obstacks are
totally general: an obstack can contain any number of objects of
any size. They are implemented with macros, so allocation is
usually very fast as long as the objects are usually small. And the
only space overhead per object is the padding needed to start each
object on a suitable boundary...
So we can use obstack for any object of any size an allocation is very fast with a little space overhead which is not a big deal. I wrote this code to read input string without knowing the length of it.
#include <stdio.h>
#include <stdlib.h>
#include <obstack.h>
#define obstack_chunk_alloc malloc
#define obstack_chunk_free free
int main(){
unsigned char c;
struct obstack * mystack;
mystack = (struct obstack *) malloc(sizeof(struct obstack));
obstack_init(mystack);
c = fgetc(stdin);
while(c!='\r' && c!='\n'){
obstack_1grow(mystack,c);
c = fgetc(stdin);
}
printf("the size of the stack is: %d\n",obstack_object_size(mystack));
printf("the input is: %s\n",(char *)obstack_finish(mystack));
return 0;
}
So my question is :
Is it safe to use obstack like this?
Is it like using POSIX getline?
Am I missing something here? any drawbacks?
Why shouldn't I using it?
thanks in advance.
fgets has no drawbacks over gets. It just forces you to acknowledge that you must know the size of the buffer. gets instead requires you to somehow magically know beforehand the length of the input a (possibly malicious) user is going to feed into your program. That is why gets was removed from the C programming language. It is now non-standard, while fgets is standard and portable.
As for knowing the length of the line beforehand, POSIX says that an utility must be prepared to handle lines that fit in buffers that are of LINE_MAX size. Thus you can do:
char line[LINE_MAX];
while (fgets(line, LINE_MAX, fp) != NULL)
and any file that produces problems with that is not a standard text file. In practice everything will be mostly fine if you just don't blindly assume that the last character in the buffer is always '\n' (which it isn't).
getline is a POSIX standard function. obstack is a GNU libc extension that is not portable. getline was built for efficient reading of lines from files, obstack was not, it was built to be generic. With obstack, the string is not properly contiguous in memory / in its final place, until you call obstack_finish.
Use getline if on POSIX, use fgets in programs that need to be maximally portable; look for an emulation of getline for non-POSIX platforms built on fgets.
Why shouldn't I using it?
Well, you shouldn't use getline() if you care about portability. You should use getline() if you're specifically targeting only POSIX systems.
As for obstacks, they're specific to the GNU C library, which might already be a strong reason to avoid them (it further restricts portability). Also, they're not meant to be used for this purpose.
If you aim for portability, just use fgets(). It's not too complicated to write a function similar to getline() based on fgets() -- here's an example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define CHUNKSIZE 1024
char *readline(FILE *f)
{
size_t bufsize = CHUNKSIZE;
char *buf = malloc(bufsize);
if (!buf) return 0;
char *pos = buf;
size_t len = 0;
while (fgets(pos, CHUNKSIZE, f))
{
char *nl = strchr(pos, '\n');
if (nl)
{
// newline found, replace with string terminator
*nl = '\0';
char *tmp = realloc(buf, len + strlen(pos) + 1);
if (tmp) return tmp;
return buf;
}
// no newline, increase buffer size
len += strlen(pos);
char *tmp = realloc(buf, len + CHUNKSIZE);
if (!tmp)
{
free(buf);
return 0;
}
buf = tmp;
pos = buf + len;
}
// handle case when input ends without a newline
char *tmp = realloc(buf, len + 1);
if (tmp) return tmp;
return buf;
}
int main(void)
{
char *input = readline(stdin);
if (!input)
{
fputs("Error reading input!\n", stderr);
return 1;
}
puts(input);
free(input);
return 0;
}
This one removes the newline if it was found and returns a newly allocated buffer (which the caller has to free()). Adapt to your needs. It could be improved by increasing the buffer size only when the buffer was filled completely, with just a bit more code ...

How do I write a C function that returns a variable-length string?

I need to be able to check in a kernel module whether or not a file descriptor, dentry, or inode falls under a certain path. To do this, I am going to have to write a function that when given a dentry or a file descriptor (not sure which, yet), will return said object's full path name.
What is the way to write a function that returns variable-length strings?
You can try like this:
char *myFunction(void)
{
char *word;
word = malloc (sizeof (some_random_length));
//add some random characters
return word;
}
You can also refer related thread: best practice for returning a variable length string in c
The typical way to do this in C, is not to return anything at all:
void func (char* buf, size_t buf_size, size_t* length);
Where buf is a pointer to the buffer which will hold the string, allocated by the caller. buf_size is the size of that buffer. And length is how much of that buffer that the function used.
You could return a pointer to buf as done by for example strcpy. But this doesn't make much sense, since the same pointer already exists in one of the parameters. It adds nothing but confusion.
(Don't use strcpy, strcat etc functions as some role model for how to write functions. Many C standard library functions have obscure prototypes, because they are so terribly old, from a time when good programming practice wasn't invented, or at least not known by Dennis Ritchie.)
There are two common approaches:
One is to have a fixed size buffer to store the result:
int makeFullPath(char *buffer,size_t max_size,...)
{
int actual_size = snprintf(buffer,max_size,...);
return actual_size;
}
Examples of standard functions which use this approach are strncpy() and snprintf(). This approach has the advantage that no dynamic memory allocation is needed, which will give better performance for time-critical functions. The downside is that it puts more responsibility on the caller to be able to determine the largest possible result size in advance or be ready to reallocate if a larger size is necessary.
The second common approach is to calculate how big of a buffer to use and allocate that many bytes internally:
// Caller eventually needs to free() the result.
char* makeFullPath(...)
{
size_t max_size = calculateFullPathSize(...);
char *buffer = malloc(max_size);
if (!buffer) return NULL;
int actual_size = snprintf(buffer,max_size,...);
assert(actual_size<max_size);
return buffer;
}
An example of a standard function that uses this approach is strdup(). The advantage is that the caller no longer needs to worry about the size, but they now need to make sure that they free the result. For a kernel module, you would use kmalloc() and kfree() instead of malloc() and free().
A less common approach is to have a static buffer:
const char *makeFullPath(char *buffer,size_t max_size,...)
{
static char buffer[MAX_PATH];
int actual_size = snprintf(buffer,MAX_PATH,...);
return buffer;
}
This avoids the caller having to worry about the size or freeing the result, and it is also efficient, but it has the downside that the caller now has to make sure that they don't call the function a second time while the result of the first call is still being used.
char *result1 = makeFullPath(...);
char *result2 = makeFullPath(...);
printf("%s",result1);
printf("%s",result2); /* oops! */
Here, the caller probably meant to print two separate strings, but they'll actually just get the second string twice. This is also problematic in multi-threaded code, and probably unusable for kernel code.
For example:
char * fn( int file_id )
{
static char res[MAX_PATH];
// fill res[]
return res;
}
/*
let do it the BSTR way (BasicString of VB)
*/
char * CopyString(char *str){
unsigned short len;
char *buff;
len=lstrlen(str);
buff=malloc(sizeof(short)+len+1);
if(buff){
((short*)buff)[0]=len+1;
buff=&((short*)buff)[1];
strcpy(buff,str);
}
return buff;
}
#define len_of_string(s) ((short*)s)[-1])
#define free_string(s) free(&((short*)s)[-1]))
int main(){
char *buff=CopyString("full_path_name");
if(buff){
printf("len of string= %d\n",len_of_string(buff));
free_string(buff);
}else{
printf("Error: malloc failed\n");
}
return 0;
}
/*
now you can imagine how to reallocate the string to a new size
*/

Call to malloc, unknown size

I am getting the current working directory with _getcwd. The function requires a pointer to the buffer, and the buffers size.
In my code I am using:
char *cwdBuf;
cwdBuf = malloc(100);
I don't know the size of the buffer needed, so I am reserving memory way to big than is needed. What I would like to do is use the correct amount of memory.
Is there any way to do this?
What is your target platform? The getcwd() documentation here makes two important points:
As an extension to the POSIX.1-2001 standard, Linux (libc4, libc5, glibc) getcwd() allocates the buffer dynamically using malloc() if buf is NULL on call. In this case, the allocated buffer has the length size unless size is zero, when buf is allocated as big as necessary. It is possible (and, indeed, advisable) to free() the buffers if they have been obtained this way...
...The buf argument should be a pointer to an array at least PATH_MAX bytes long. getwd() does only return the first PATH_MAX bytes of the actual pathname.
There is usually a MAX_PATH macro defined which you can use. Also is there any reason not to just allocate on the stack?
Edit:
From the MSDN Docs:
#include <direct.h>
#include <stdlib.h>
#include <stdio.h>
int main( void )
{
char* buffer;
// Get the current working directory:
if( (buffer = _getcwd( NULL, 0 )) == NULL )
perror( "_getcwd error" );
else
{
printf( "%s \nLength: %d\n", buffer, strnlen(buffer) );
free(buffer);
}
}
So it looks like if you pass in NULL, it allocates a buffer for you.
The size of the current working directory is unknown as such but its upper limit can be asked for.
The correct way of allocating memory to use for getcwd() is by querying the system for the maximum length buffer required, via pathconf(".", _PC_PATH_MAX); before allocating the buffer and calling getcwd().
The OpenGroup UNIX standard manpages document this for getcwd() in the example code given. Quote:
#include <stdlib.h>
#include <unistd.h>
...
long size;
char *buf;
char *ptr;
size = pathconf(".", _PC_PATH_MAX);
if ((buf = (char *)malloc((size_t)size)) != NULL)
ptr = getcwd(buf, (size_t)size);
...
How about realloc() once you know the size of the result?

C: A safer way to check buffer in function and append to it?

I have a function of which I need to return the time for another logging function, and it looks like this:
//put time in to buf, format 00:00:00\0
void gettimestr(char buf[9]) {
if(strlen(buf) != 9) { //experimental error checking
fprintf(stderr, "Buf appears to be %d bytes and not 9!\n", strlen( buf ));
}
time_t cur_time;
time(&cur_time);
struct tm *ts = localtime(&cur_time);
sprintf(buf, "%02d:%02d:%02d",
ts->tm_hour,
ts->tm_min,
ts->tm_sec );
strncat(buf, "\0", 1);
}
Now I guess the main problem is checking if the buffer is long enough, sizeof() returns a pointer size and strlen seems to randomly return 0 or something such as 12 on two different calls.
My first question is, how would I be able to detect the size of the buffer safely, is it possible?
My other question is, is accepting buf[9] a favourable method or should I accept a pointer to a buffer, and use strcat() instead of sprintf() to append the time to it? sprintf makes it easier for padding zeros to the time values, although it seems to only accept a character array and not a pointer.
Your function assumes that the buffer being passed in already contains a null-terminated string with 9 characters. That doesn't make sense.
The proper way would be to request the size as an argument:
void gettimestr(char *buf, int bufferSize) {
and use snprintf:
snprintf(buf, bufferSize, "%02dx....", ....);<sub>*</sub>
And terminate the string since snprintf won't do that if you exceed the limit:
buf[bufferSize-1] = 0;
You can call your function like this:
char buffer[16];
gettimestr(buffer, sizeof(buffer));
There is no other way to determine the size. This isn't Java where an array knows its size. Passing a char * will simply send a pointer down to the function with no further information, so your only way to get the size of the buffer is by requiring the caller to specify it.
(EDIT: snprintf should always terminate the string properly, as pointed out in the comments.)
#EboMike is right. Just to complement his answer, you could check the buffer with:
void gettimestr(char *buf, int bufferSize) {
if (!buf) {
fprintf(stderr, "Null buffer\n");
return;
}
// rest of the code
}

Resources