Call to malloc, unknown size - c

I am getting the current working directory with _getcwd. The function requires a pointer to the buffer, and the buffers size.
In my code I am using:
char *cwdBuf;
cwdBuf = malloc(100);
I don't know the size of the buffer needed, so I am reserving memory way to big than is needed. What I would like to do is use the correct amount of memory.
Is there any way to do this?

What is your target platform? The getcwd() documentation here makes two important points:
As an extension to the POSIX.1-2001 standard, Linux (libc4, libc5, glibc) getcwd() allocates the buffer dynamically using malloc() if buf is NULL on call. In this case, the allocated buffer has the length size unless size is zero, when buf is allocated as big as necessary. It is possible (and, indeed, advisable) to free() the buffers if they have been obtained this way...
...The buf argument should be a pointer to an array at least PATH_MAX bytes long. getwd() does only return the first PATH_MAX bytes of the actual pathname.

There is usually a MAX_PATH macro defined which you can use. Also is there any reason not to just allocate on the stack?
Edit:
From the MSDN Docs:
#include <direct.h>
#include <stdlib.h>
#include <stdio.h>
int main( void )
{
char* buffer;
// Get the current working directory:
if( (buffer = _getcwd( NULL, 0 )) == NULL )
perror( "_getcwd error" );
else
{
printf( "%s \nLength: %d\n", buffer, strnlen(buffer) );
free(buffer);
}
}
So it looks like if you pass in NULL, it allocates a buffer for you.

The size of the current working directory is unknown as such but its upper limit can be asked for.
The correct way of allocating memory to use for getcwd() is by querying the system for the maximum length buffer required, via pathconf(".", _PC_PATH_MAX); before allocating the buffer and calling getcwd().
The OpenGroup UNIX standard manpages document this for getcwd() in the example code given. Quote:
#include <stdlib.h>
#include <unistd.h>
...
long size;
char *buf;
char *ptr;
size = pathconf(".", _PC_PATH_MAX);
if ((buf = (char *)malloc((size_t)size)) != NULL)
ptr = getcwd(buf, (size_t)size);
...

How about realloc() once you know the size of the result?

Related

assigning a string and array of strings with only one malloc

I'm making a webserver in C, and I want to allocate just one chunk of memory for everything (strings and arrays).
My allocation strategy starts with this. and bp is the buffer pointer for searches:
char *bp, *buf=malloc(1048576); // allocate 1MB
First 64KB will be the max space for the full HTTP request unprocessed (because I'm not dealing with uploaded file requests). The remainder of the 1MB that's allocated will contain each header that hopefully will be easily be accessible.
Now if I programmed the extraction code this way, I'd have no problem:
char *httpreq=buf+65536;
int linesize=8192; //size of each line
int httprn=0; // Http request header number. increments for each header found.
char *crlf;
while((crlf=strstr(bp,"\r\n"))){ //loop until no more enters are found
memmove(httpreq+(httprn*linesize),bp,crlf-bp);
bp+=2; //move pointer to skip CRLF.
httprn++;
}
But I'd rather program the code this way:
int linesize=8192; //size of each line
char *httpreq[linesize]=buf+65536;
int httprn=0;
while((crlf=strstr(bp,"\r\n"))){
memmove(httpreq[httprn++],bp+=2,crlf-bp); //skip CRLF
}
However the C compiler tells me that I have an invalid initializer and its referring to this particular line:
char *httpreq[linesize]=buf+65536;
is there any way I can use this kind of syntax:
httpreq[n]
instead of this:
httpreq+(linesize*n)
to read the HTTP header n without having to use local static memory?
This:
char httpreq[n][n];
would use static memory, but I'd rather use extended memory for string allocation.
Any ideas?
Yes, but you need to properly construct the pointers. Here is example of what you want to achieve:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define LINESIZE 8192
int main() {
char *buf = (char *) malloc(1048576);
char (*ht)[LINESIZE];
ht = (char (*)[])( buf + 65536);
printf("bf %p\n", buf);
printf("ht[0] %p\n", ht[0] );
printf("ht[1] %p\n", ht[1] );
sprintf(ht[0],"%s\n", "This is the first line");
sprintf(ht[1],"%s\n", "This is the second line");
printf("%s", ht[0]);
printf("%s", ht[1]);
}
so, the char (*ht)[LINESIZE] tells the compiler that ht is an array of char *, each one LINESIZE long.
The (char (*)[])(buff + 65536) is casting the calculation of the offset in the type of ht.

obstack, gets and getline

I am trying to get a line from stdin. as far as I understand, we should never use gets as said in man page of gets:
Never use gets(). Because it is impossible to tell without knowing
the data in advance how many characters gets() will read, and
because gets() will continue to store characters past the end of the
buffer, it is extremely dangerous to use. It has been used to
break computer security. Use fgets() instead.
it suggests that we can use fgets() instead. the problem with fgets() is that we don't know the size of the user input in advance and fgets() read exactly one less than size bytes from the stream as man said:
fgets() reads in at most one less than size characters from stream
and stores them into the buffer pointed to by s. Reading stops
after an EOF or a newline. If a newline is read, it is stored into
the buffer. A terminating null byte ('\0') is stored after the last
character in the buffer.
There is also another approach which is using POSIX getline() which uses realloc to update the buffer size so we can read any string with arbitrary length from input stream as man said:
Alternatively, before calling getline(), *lineptr can contain a
pointer to a malloc(3)-allocated buffer *n bytes in size. If the
buffer is not large enough to hold the line, getline() resizes it
with realloc(3), updating *lineptr and *n as necessary.
and finally there is another approach which is using obstack as libc manual said:
Aside from this one constraint of order of freeing, obstacks are
totally general: an obstack can contain any number of objects of
any size. They are implemented with macros, so allocation is
usually very fast as long as the objects are usually small. And the
only space overhead per object is the padding needed to start each
object on a suitable boundary...
So we can use obstack for any object of any size an allocation is very fast with a little space overhead which is not a big deal. I wrote this code to read input string without knowing the length of it.
#include <stdio.h>
#include <stdlib.h>
#include <obstack.h>
#define obstack_chunk_alloc malloc
#define obstack_chunk_free free
int main(){
unsigned char c;
struct obstack * mystack;
mystack = (struct obstack *) malloc(sizeof(struct obstack));
obstack_init(mystack);
c = fgetc(stdin);
while(c!='\r' && c!='\n'){
obstack_1grow(mystack,c);
c = fgetc(stdin);
}
printf("the size of the stack is: %d\n",obstack_object_size(mystack));
printf("the input is: %s\n",(char *)obstack_finish(mystack));
return 0;
}
So my question is :
Is it safe to use obstack like this?
Is it like using POSIX getline?
Am I missing something here? any drawbacks?
Why shouldn't I using it?
thanks in advance.
fgets has no drawbacks over gets. It just forces you to acknowledge that you must know the size of the buffer. gets instead requires you to somehow magically know beforehand the length of the input a (possibly malicious) user is going to feed into your program. That is why gets was removed from the C programming language. It is now non-standard, while fgets is standard and portable.
As for knowing the length of the line beforehand, POSIX says that an utility must be prepared to handle lines that fit in buffers that are of LINE_MAX size. Thus you can do:
char line[LINE_MAX];
while (fgets(line, LINE_MAX, fp) != NULL)
and any file that produces problems with that is not a standard text file. In practice everything will be mostly fine if you just don't blindly assume that the last character in the buffer is always '\n' (which it isn't).
getline is a POSIX standard function. obstack is a GNU libc extension that is not portable. getline was built for efficient reading of lines from files, obstack was not, it was built to be generic. With obstack, the string is not properly contiguous in memory / in its final place, until you call obstack_finish.
Use getline if on POSIX, use fgets in programs that need to be maximally portable; look for an emulation of getline for non-POSIX platforms built on fgets.
Why shouldn't I using it?
Well, you shouldn't use getline() if you care about portability. You should use getline() if you're specifically targeting only POSIX systems.
As for obstacks, they're specific to the GNU C library, which might already be a strong reason to avoid them (it further restricts portability). Also, they're not meant to be used for this purpose.
If you aim for portability, just use fgets(). It's not too complicated to write a function similar to getline() based on fgets() -- here's an example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define CHUNKSIZE 1024
char *readline(FILE *f)
{
size_t bufsize = CHUNKSIZE;
char *buf = malloc(bufsize);
if (!buf) return 0;
char *pos = buf;
size_t len = 0;
while (fgets(pos, CHUNKSIZE, f))
{
char *nl = strchr(pos, '\n');
if (nl)
{
// newline found, replace with string terminator
*nl = '\0';
char *tmp = realloc(buf, len + strlen(pos) + 1);
if (tmp) return tmp;
return buf;
}
// no newline, increase buffer size
len += strlen(pos);
char *tmp = realloc(buf, len + CHUNKSIZE);
if (!tmp)
{
free(buf);
return 0;
}
buf = tmp;
pos = buf + len;
}
// handle case when input ends without a newline
char *tmp = realloc(buf, len + 1);
if (tmp) return tmp;
return buf;
}
int main(void)
{
char *input = readline(stdin);
if (!input)
{
fputs("Error reading input!\n", stderr);
return 1;
}
puts(input);
free(input);
return 0;
}
This one removes the newline if it was found and returns a newly allocated buffer (which the caller has to free()). Adapt to your needs. It could be improved by increasing the buffer size only when the buffer was filled completely, with just a bit more code ...

How do I write a C function that returns a variable-length string?

I need to be able to check in a kernel module whether or not a file descriptor, dentry, or inode falls under a certain path. To do this, I am going to have to write a function that when given a dentry or a file descriptor (not sure which, yet), will return said object's full path name.
What is the way to write a function that returns variable-length strings?
You can try like this:
char *myFunction(void)
{
char *word;
word = malloc (sizeof (some_random_length));
//add some random characters
return word;
}
You can also refer related thread: best practice for returning a variable length string in c
The typical way to do this in C, is not to return anything at all:
void func (char* buf, size_t buf_size, size_t* length);
Where buf is a pointer to the buffer which will hold the string, allocated by the caller. buf_size is the size of that buffer. And length is how much of that buffer that the function used.
You could return a pointer to buf as done by for example strcpy. But this doesn't make much sense, since the same pointer already exists in one of the parameters. It adds nothing but confusion.
(Don't use strcpy, strcat etc functions as some role model for how to write functions. Many C standard library functions have obscure prototypes, because they are so terribly old, from a time when good programming practice wasn't invented, or at least not known by Dennis Ritchie.)
There are two common approaches:
One is to have a fixed size buffer to store the result:
int makeFullPath(char *buffer,size_t max_size,...)
{
int actual_size = snprintf(buffer,max_size,...);
return actual_size;
}
Examples of standard functions which use this approach are strncpy() and snprintf(). This approach has the advantage that no dynamic memory allocation is needed, which will give better performance for time-critical functions. The downside is that it puts more responsibility on the caller to be able to determine the largest possible result size in advance or be ready to reallocate if a larger size is necessary.
The second common approach is to calculate how big of a buffer to use and allocate that many bytes internally:
// Caller eventually needs to free() the result.
char* makeFullPath(...)
{
size_t max_size = calculateFullPathSize(...);
char *buffer = malloc(max_size);
if (!buffer) return NULL;
int actual_size = snprintf(buffer,max_size,...);
assert(actual_size<max_size);
return buffer;
}
An example of a standard function that uses this approach is strdup(). The advantage is that the caller no longer needs to worry about the size, but they now need to make sure that they free the result. For a kernel module, you would use kmalloc() and kfree() instead of malloc() and free().
A less common approach is to have a static buffer:
const char *makeFullPath(char *buffer,size_t max_size,...)
{
static char buffer[MAX_PATH];
int actual_size = snprintf(buffer,MAX_PATH,...);
return buffer;
}
This avoids the caller having to worry about the size or freeing the result, and it is also efficient, but it has the downside that the caller now has to make sure that they don't call the function a second time while the result of the first call is still being used.
char *result1 = makeFullPath(...);
char *result2 = makeFullPath(...);
printf("%s",result1);
printf("%s",result2); /* oops! */
Here, the caller probably meant to print two separate strings, but they'll actually just get the second string twice. This is also problematic in multi-threaded code, and probably unusable for kernel code.
For example:
char * fn( int file_id )
{
static char res[MAX_PATH];
// fill res[]
return res;
}
/*
let do it the BSTR way (BasicString of VB)
*/
char * CopyString(char *str){
unsigned short len;
char *buff;
len=lstrlen(str);
buff=malloc(sizeof(short)+len+1);
if(buff){
((short*)buff)[0]=len+1;
buff=&((short*)buff)[1];
strcpy(buff,str);
}
return buff;
}
#define len_of_string(s) ((short*)s)[-1])
#define free_string(s) free(&((short*)s)[-1]))
int main(){
char *buff=CopyString("full_path_name");
if(buff){
printf("len of string= %d\n",len_of_string(buff));
free_string(buff);
}else{
printf("Error: malloc failed\n");
}
return 0;
}
/*
now you can imagine how to reallocate the string to a new size
*/

Howto use readlink with dynamic memory allocation

Problem:
On a linux machine I want to read the target string of a link. From documentation I have found the following code sample (without error processing):
struct stat sb;
ssize_t r;
char * linkname;
lstat("<some link>", &sb);
linkname = malloc(sb.st_size + 1);
r = readlink("/proc/self/exe", linkname, sb.st_size + 1);
The probelm is that sb.st_size returns 0 for links on my system.
So how does one allocate memory dynamically for readline on such systems?
Many thanks!
One possible solution:
For future reference. Using the points made by jilles:
struct stat sb;
ssize_t r = INT_MAX;
int linkSize = 0;
const int growthRate = 255;
char * linkTarget = NULL;
// get length of the pathname the link points to
if (lstat("/proc/self/exe", &sb) == -1) { // could not lstat: insufficient permissions on directory?
perror("lstat");
return;
}
// read the link target into a string
linkSize = sb.st_size + 1 - growthRate;
while (r >= linkSize) { // i.e. symlink increased in size since lstat() or non-POSIX compliant filesystem
// allocate sufficient memory to hold the link
linkSize += growthRate;
free(linkTarget);
linkTarget = malloc(linkSize);
if (linkTarget == NULL) { // insufficient memory
fprintf(stderr, "setProcessName(): insufficient memory\n");
return;
}
// read the link target into variable linkTarget
r = readlink("/proc/self/exe", linkTarget, linkSize);
if (r < 0) { // readlink failed: link was deleted?
perror("lstat");
return;
}
}
linkTarget[r] = '\0'; // readlink does not null-terminate the string
POSIX says the st_size field for a symlink shall be set to the length of the pathname in the link (without '\0'). However, the /proc filesystem on Linux is not POSIX-compliant. (It has more violations than just this one, such as when reading certain files one byte at a time.)
You can allocate a buffer of a certain size, try readlink() and retry with a larger buffer if the buffer was not large enough (readlink() returned as many bytes as fit in the buffer), until the buffer is large enough.
Alternatively you can use PATH_MAX and break portability to systems where it is not a compile-time constant or where the pathname may be longer than that (POSIX permits either).
The other answers don't mention it, but there is the realpath function, that does exactly what you want, which is specified by POSIX.1-2001.
char *realpath(const char *path, char *resolved_path);
from the manpage:
realpath() expands all symbolic links and resolves references to
/./, /../ and extra '/' characters in the null-terminated string named
by path to produce a canonicalized absolute pathname.
realpath also handles the dynamic memory allocation for you, if you want. Again, excerpt from the manpage:
If resolved_path is specified as NULL, then realpath() uses
malloc(3) to allocate a buffer of up to PATH_MAX bytes to hold the
resolved pathname, and returns a pointer to this buffer. The caller
should deallocate this buffer using free(3).
As a simple, complete example:
#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
int
resolve_link (const char *filename)
{
char *res = realpath(filename, NULL);
if (res == NULL)
{
perror("realpath failed");
return -1;
}
printf("%s -> %s\n", filename, res);
free(res);
return 0;
}
int
main (void)
{
resolve_link("/proc/self/exe");
return 0;
}
st_size does not give the correct answer on /proc.
Instead you can malloc PATH_MAX, or pathconf(_PC_PATH_MAX) bytes. That should be enough for most cases. If you want to be able to handle paths longer than that, you can call readlink in a loop and reallocate your buffer if the readlink return value indicates that the buffer is too short. Note though that many other POSIX functions simply assume PATH_MAX is enough.
I'm a bit puzzled as to why st_size is zero. Per POSIX:
For symbolic links, the st_mode member shall contain meaningful information when used with the file type macros. The file mode bits in st_mode are unspecified. The structure members st_ino, st_dev, st_uid, st_gid, st_atim, st_ctim, and st_mtim shall have meaningful values and the value of the st_nlink member shall be set to the number of (hard) links to the symbolic link. The value of the st_size member shall be set to the length of the pathname contained in the symbolic link not including any terminating null byte.
Source: http://pubs.opengroup.org/onlinepubs/9699919799/functions/lstat.html
If st_size does not work, I think your only option is to dynamically allocate a buffer and keep resizing it larger as long as the return value of readlink is equal to the buffer size.
The manpage for readlink(2) says it will silently truncate if the buffer is too small. If you truly want to be unbounded (and don't mind paying some cost for extra work) you can start with a given allocation size and keep increasing it and re-trying the readlink call. You can stop growing the buffer when the next call to readlink returns the same string it did for the last iteration.
What exactly are you trying to achieve with the lstat?
You should be able to get the target with just the following
char buffer[1024];
ssize_t r = readlink ("/proc/self/exe", buffer, 1024);
buffer[r] = 0;
printf ("%s\n", buffer);
If you're trying to get the length of the file name size, I don't think st_size is the right variable for that... But that's possibly a different question.

Proper memory allocation?

How would I only allocate as much memory as really needed without knowing how big the arguments to the function are?
Usually, I would use a fixed size, and calculate the rest with sizeof (note: the code isn't supposed to make sense, but to show the problem):
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
int test(const char* format, ...)
{
char* buffer;
int bufsize;
int status;
va_list arguments;
va_start(arguments, format);
bufsize = 1024; /* fixed size */
bufsize = sizeof(arguments) + sizeof(format) + 1024;
buffer = (char*)malloc(bufsize);
status = vsprintf(buffer, format, arguments);
fputs(buffer, stdout);
va_end(arguments);
return status;
}
int main()
{
const char* name = "World";
test("Hello, %s\n", name);
return 0;
}
However, I don't think this is the way to go... so, how would I calculate the required buffersize properly here?
If you have vsnprintf available to you, I would make use of that. It prevents buffer overflow since you provide the buffer size, and it returns the actual size needed.
So allocate your 1K buffer then attempt to use vsnprintf to write into that buffer, limiting the size. If the size returned was less than or equal to your buffer size, then it's worked and you can just use the buffer.
If the size returned was greater than the buffer size, then call realloc to get a bigger buffer and try it again. Provided the data hasn't changed (e.g., threading issues), the second one will work fine since you already know how big it will be.
This is relatively efficient provided you choose your default buffer size carefully. If the vast majority of your outputs are within that limit, very few reallocations has to take place (see below for a possible optimisation).
If you don't have an vsnprintf-type function, a trick we've used before is to open a file handle to /dev/null and use that for the same purpose (checking the size before outputting to a buffer). Use vfprintf to that file handle to get the size (the output goes to the bit bucket), then allocate enough space based on the return value, and vsprintf to that buffer. Again, it should be large enough since you've figured out the needed size.
An optimisation to the methods above would be to use a local buffer, rather than an allocated buffer, for the 1K chunk. This avoids having to use malloc in those situations where it's unnecessary, assuming your stack can handle it.
In other words, use something like:
int test(const char* format, ...)
{
char buff1k[1024];
char *buffer = buff1k; // default to local buffer, no malloc.
:
int need = 1 + vsnprintf (buffer, sizeof (buff1k), format, arguments);
if (need > sizeof (buff1k)) {
buffer = malloc (need);
// Now you have a big-enough buffer, vsprintf into there.
}
// Use string at buffer for whatever you want.
...
// Only free buffer if it was allocated.
if (buffer != buff1k)
free (buffer);
}

Resources