Howto use readlink with dynamic memory allocation - c

Problem:
On a linux machine I want to read the target string of a link. From documentation I have found the following code sample (without error processing):
struct stat sb;
ssize_t r;
char * linkname;
lstat("<some link>", &sb);
linkname = malloc(sb.st_size + 1);
r = readlink("/proc/self/exe", linkname, sb.st_size + 1);
The probelm is that sb.st_size returns 0 for links on my system.
So how does one allocate memory dynamically for readline on such systems?
Many thanks!
One possible solution:
For future reference. Using the points made by jilles:
struct stat sb;
ssize_t r = INT_MAX;
int linkSize = 0;
const int growthRate = 255;
char * linkTarget = NULL;
// get length of the pathname the link points to
if (lstat("/proc/self/exe", &sb) == -1) { // could not lstat: insufficient permissions on directory?
perror("lstat");
return;
}
// read the link target into a string
linkSize = sb.st_size + 1 - growthRate;
while (r >= linkSize) { // i.e. symlink increased in size since lstat() or non-POSIX compliant filesystem
// allocate sufficient memory to hold the link
linkSize += growthRate;
free(linkTarget);
linkTarget = malloc(linkSize);
if (linkTarget == NULL) { // insufficient memory
fprintf(stderr, "setProcessName(): insufficient memory\n");
return;
}
// read the link target into variable linkTarget
r = readlink("/proc/self/exe", linkTarget, linkSize);
if (r < 0) { // readlink failed: link was deleted?
perror("lstat");
return;
}
}
linkTarget[r] = '\0'; // readlink does not null-terminate the string

POSIX says the st_size field for a symlink shall be set to the length of the pathname in the link (without '\0'). However, the /proc filesystem on Linux is not POSIX-compliant. (It has more violations than just this one, such as when reading certain files one byte at a time.)
You can allocate a buffer of a certain size, try readlink() and retry with a larger buffer if the buffer was not large enough (readlink() returned as many bytes as fit in the buffer), until the buffer is large enough.
Alternatively you can use PATH_MAX and break portability to systems where it is not a compile-time constant or where the pathname may be longer than that (POSIX permits either).

The other answers don't mention it, but there is the realpath function, that does exactly what you want, which is specified by POSIX.1-2001.
char *realpath(const char *path, char *resolved_path);
from the manpage:
realpath() expands all symbolic links and resolves references to
/./, /../ and extra '/' characters in the null-terminated string named
by path to produce a canonicalized absolute pathname.
realpath also handles the dynamic memory allocation for you, if you want. Again, excerpt from the manpage:
If resolved_path is specified as NULL, then realpath() uses
malloc(3) to allocate a buffer of up to PATH_MAX bytes to hold the
resolved pathname, and returns a pointer to this buffer. The caller
should deallocate this buffer using free(3).
As a simple, complete example:
#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
int
resolve_link (const char *filename)
{
char *res = realpath(filename, NULL);
if (res == NULL)
{
perror("realpath failed");
return -1;
}
printf("%s -> %s\n", filename, res);
free(res);
return 0;
}
int
main (void)
{
resolve_link("/proc/self/exe");
return 0;
}

st_size does not give the correct answer on /proc.
Instead you can malloc PATH_MAX, or pathconf(_PC_PATH_MAX) bytes. That should be enough for most cases. If you want to be able to handle paths longer than that, you can call readlink in a loop and reallocate your buffer if the readlink return value indicates that the buffer is too short. Note though that many other POSIX functions simply assume PATH_MAX is enough.

I'm a bit puzzled as to why st_size is zero. Per POSIX:
For symbolic links, the st_mode member shall contain meaningful information when used with the file type macros. The file mode bits in st_mode are unspecified. The structure members st_ino, st_dev, st_uid, st_gid, st_atim, st_ctim, and st_mtim shall have meaningful values and the value of the st_nlink member shall be set to the number of (hard) links to the symbolic link. The value of the st_size member shall be set to the length of the pathname contained in the symbolic link not including any terminating null byte.
Source: http://pubs.opengroup.org/onlinepubs/9699919799/functions/lstat.html
If st_size does not work, I think your only option is to dynamically allocate a buffer and keep resizing it larger as long as the return value of readlink is equal to the buffer size.

The manpage for readlink(2) says it will silently truncate if the buffer is too small. If you truly want to be unbounded (and don't mind paying some cost for extra work) you can start with a given allocation size and keep increasing it and re-trying the readlink call. You can stop growing the buffer when the next call to readlink returns the same string it did for the last iteration.

What exactly are you trying to achieve with the lstat?
You should be able to get the target with just the following
char buffer[1024];
ssize_t r = readlink ("/proc/self/exe", buffer, 1024);
buffer[r] = 0;
printf ("%s\n", buffer);
If you're trying to get the length of the file name size, I don't think st_size is the right variable for that... But that's possibly a different question.

Related

C redirect fprintf into a buffer or char array

I have the following function and I am wondering if there is a way to pass string or char array instead of stdout into it so I can get the printed representation as a string.
void print_Type(Type t, FILE *f)
{
fprintf(f,"stuff ...");
}
print_Type(t, stdout);
I have already tried this:
int SIZE = 100;
char buffer[SIZE];
print_Type(t, buffer);
But this is what I am seeing:
�����
Something like this
FILE* f = fmemopen(buffer, sizeof(buffer), "w");
print_Type(t, f);
fclose(f);
The fmemopen(void *buf, size_t size, const char *mode) function opens a stream. The stream allows I/O to be performed on the string or memory buffer pointed to by buf.
Yes there is sprintf() notice the leading s rather than f.
int SIZE = 100;
char buffer[SIZE];
sprintf(buffer, "stuff %d", 10);
This function prints to a string s rather than a file f. It has exactly the same properties and parameters to fprintf() the only difference is the destination, which must be a char array (either statically allocated as an array or dynamical allocated (usually via malloc)).
Note: This function is dangerous as it does not check the length and can easily overrun the end of the buffer if you are not careful.
If you are using a later version of C (c99). A better function is snprintf this adds the extra buffer length checking.
The problem with fmemopen is that it cannot resize the buffer. fmemopen did exist in Glibc for quite some time, but it was standardized only in POSIX.1-2008. But that revision included another function that handles dynamic memory allocation: open_memstream(3):
char *buffer = NULL;
size_t size = 0;
FILE* f = open_memstream(&buffer, &size);
print_Type(t, f);
fclose(f);
buffer will now point to a null-terminated buffer, with size bytes before the extra null terminator! I.e. you didn't write null bytes, then strlen(buffer) == size.
Thus the only merit of fmemopen is that it can be used to write to a fixed location memory buffer or fixed length, whereas open_memstream should be used everywhere else where the location of the buffer does not matter.
For fmemopen there is yet another undesired feature - the writes may fail when the buffer is being flushed and not before. Since the target is in memory, there is no point in buffering the writes, so it is suggested that if you choose to use fmemopen, Linux manual page fmemopen(3) recommends disabling buffering with setbuf(f, NULL);

How to properly set buffer and bufsize with getpwuid_r()?

Background Info
I am trying to get the string of a user's username, with the only info provided about that user being their uid number. I have the uid as a result of a preceding call to fstat (and the uid is stored in a struct stat).
I need to get the username in a thread-safe manner, and so I am trying to use getpwuid_r(). According to the getpwuid (3) man page:
int getpwuid_r(uid_t uid, struct passwd *pwd, char *buffer,
size_t bufsize, struct passwd **result);
The getpwuid_r() function shall update the passwd structure pointed
to by pwd and store a pointer to that structure at the location
pointed to by result. The structure shall contain an entry from
the user database with a matching uid. Storage referenced by the
structure is allocated from the memory provided with the buffer
parameter, which is bufsize bytes in size. A call to
sysconf(_SC_GETPW_R_SIZE_MAX) returns either −1 without changing
errno or an initial value suggested for the size of this buffer. A
null pointer shall be returned at the location pointed to by result
on error or if the requested entry is not found.
If successful, the getpwuid_r() function shall return zero;
otherwise, an error number shall be returned to indicate the error.
Problem Statement
Upon reading the man page example below, I am confused as to why they need to iterate, while increasing the size of the buffer, until the buffer can hold its information.
I am under the presumption that the buffer holds the struct passwd pwd - considering this, why can't we just set buffer = (void *) malloc(getsize(struct passwd)) and bufsize = sizeof(struct passwd)?
long int initlen = sysconf(_SC_GETPW_R_SIZE_MAX);
size_t len;
if (initlen == −1)
/* Default initial length. */
len = 1024;
else
len = (size_t) initlen;
struct passwd result;
struct passwd *resultp;
char *buffer = malloc(len);
if (buffer == NULL)
...handle error...
int e;
while ((e = getpwuid_r(42, &result, buffer, len, &resultp)) == ERANGE)
{
size_t newlen = 2 * len;
if (newlen < len)
...handle error...
len = newlen;
char *newbuffer = realloc(buffer, len);
if (newbuffer == NULL)
...handle error...
buffer = newbuffer;
}
if (e != 0)
...handle error...
free (buffer);
Is there something I'm not understanding about how this function sets the data within pwd? Perhaps I don't fully understand how the struct passwd we are setting is related to the buffer space.
The passwd struct is defined by the standard to contain at least these members:
char *pw_name // User's login name.
uid_t pw_uid // Numerical user ID.
gid_t pw_gid // Numerical group ID.
char *pw_dir // Initial working directory.
char *pw_shell // Program to use as shell.
Note the three char * members; they point to storage that lies elsewhere, outside of the struct.
Many implementations will have two more char * members: pw_passwd and pw_gecos.
The difference between getpwuid and getpwuid_r is that the former may use a static buffer to store the name, passwd, dir, gecos, and shell strings1 - as well as the passwd struct itself - while the latter, since it's reentrant, requires the user to supply one buffer to hold struct passwd and another buffer to hold the character strings.
In practice, the two functions share a lot of common code.
I am confused as to why they need to iterate, while increasing the size of the buffer, until the buffer can hold its information.
If the call to sysconf(_SC_GETPW_R_SIZE_MAX) fails, you just have to guess how big the buffer for the character strings should be, and keep increasing its size until it's big enough.
1 In V7, when all the info was in /etc/passwd, this static buffer was just a copy of the appropriate line of /etc/passwd with a NUL inserted at the end of each of the five string fields.

Why is read() syscall blocking when I pass in a invalid buffer pointer?

Here is my code snippet read(STDIN, NULL, 10) executed on Linux-2.6.32.431. I assumed it would return immediadely after I'd browsed the read() syscall's source code:
SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
{
struct file *file;
ssize_t ret = -EBADF;
int fput_needed;
file = fget_light(fd, &fput_needed);
if (file) {
loff_t pos = file_pos_read(file);
ret = vfs_read(file, buf, count, &pos);
file_pos_write(file, pos);
fput_light(file, fput_needed);
}
return ret;
}
and
ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
{
ssize_t ret;
if (!(file->f_mode & FMODE_READ))
return -EBADF;
if (!file->f_op || (!file->f_op->read && !file->f_op->aio_read))
return -EINVAL;
if (unlikely(!access_ok(VERIFY_WRITE, buf, count))) //I suppose it should return here
return -EFAULT;
...
}
However, it got blocked. After I typed in some characters and hitted return, this program consumed one character and returned while the remaining characters inputed into the terminal.
My question is:
why did the read() syscall get blocked?
why did the remaining characters get inputed into the terminal.
I believe access_ok does not do exactly what its name implies.
From the comments in arch/x86/include/asm/uaccess.h:
/**
* access_ok: - Checks if a user space pointer is valid
* #type: Type of access: %VERIFY_READ or %VERIFY_WRITE. Note that
* %VERIFY_WRITE is a superset of %VERIFY_READ - if it is safe
* to write to a block, it is always safe to read from it.
* #addr: User space pointer to start of block to check
* #size: Size of block to check
*
* Context: User context only. This function may sleep.
*
* Checks if a pointer to a block of memory in user space is valid.
*
* Returns true (nonzero) if the memory block may be valid, false (zero)
* if it is definitely invalid.
*
* Note that, depending on architecture, this function probably just
* checks that the pointer is in the user space range - after calling
* this function, memory access functions may still return -EFAULT.
*/
The comments appear to be accurate; on x86, if you trace the definition of access_ok, you will find it just checks (essentially) whether addr + size > user_addr_max(). In particular, it returns "true" for a NULL pointer.
So you have to trace vfs_read a little further, into the call to file->f_op->read(), which is presumably invoking the read function for the TTY driver, which is presumably where it is blocking.
(Note that POSIX guarantees nothing when you pass a NULL pointer to read, so I would advise not doing that.)
[Update]
For your second question, it's the same reason this sequence reads one character and then passes the rest to the terminal:
$ head -c 1 > /dev/null
lalala
$ alala
alala: command not found
All I did was input "lalala" to the head command. Your program is presumably consuming one character of TTY input, terminating (crashing), and then the rest of the input to the TTY is being consumed by the shell after your program exits.
If you check the read manual page you will see that:
EFAULT buf is outside your accessible address space.
A NULL pointer is still within the accessible address space of all processes. Writing to, or dereferencing, a NULL pointer leads to undefined behavior, but it's still a valid address.
So the read call blocks because there's no input to be read. When there is, the process will most likely crash.

Call to malloc, unknown size

I am getting the current working directory with _getcwd. The function requires a pointer to the buffer, and the buffers size.
In my code I am using:
char *cwdBuf;
cwdBuf = malloc(100);
I don't know the size of the buffer needed, so I am reserving memory way to big than is needed. What I would like to do is use the correct amount of memory.
Is there any way to do this?
What is your target platform? The getcwd() documentation here makes two important points:
As an extension to the POSIX.1-2001 standard, Linux (libc4, libc5, glibc) getcwd() allocates the buffer dynamically using malloc() if buf is NULL on call. In this case, the allocated buffer has the length size unless size is zero, when buf is allocated as big as necessary. It is possible (and, indeed, advisable) to free() the buffers if they have been obtained this way...
...The buf argument should be a pointer to an array at least PATH_MAX bytes long. getwd() does only return the first PATH_MAX bytes of the actual pathname.
There is usually a MAX_PATH macro defined which you can use. Also is there any reason not to just allocate on the stack?
Edit:
From the MSDN Docs:
#include <direct.h>
#include <stdlib.h>
#include <stdio.h>
int main( void )
{
char* buffer;
// Get the current working directory:
if( (buffer = _getcwd( NULL, 0 )) == NULL )
perror( "_getcwd error" );
else
{
printf( "%s \nLength: %d\n", buffer, strnlen(buffer) );
free(buffer);
}
}
So it looks like if you pass in NULL, it allocates a buffer for you.
The size of the current working directory is unknown as such but its upper limit can be asked for.
The correct way of allocating memory to use for getcwd() is by querying the system for the maximum length buffer required, via pathconf(".", _PC_PATH_MAX); before allocating the buffer and calling getcwd().
The OpenGroup UNIX standard manpages document this for getcwd() in the example code given. Quote:
#include <stdlib.h>
#include <unistd.h>
...
long size;
char *buf;
char *ptr;
size = pathconf(".", _PC_PATH_MAX);
if ((buf = (char *)malloc((size_t)size)) != NULL)
ptr = getcwd(buf, (size_t)size);
...
How about realloc() once you know the size of the result?

Proper memory allocation?

How would I only allocate as much memory as really needed without knowing how big the arguments to the function are?
Usually, I would use a fixed size, and calculate the rest with sizeof (note: the code isn't supposed to make sense, but to show the problem):
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
int test(const char* format, ...)
{
char* buffer;
int bufsize;
int status;
va_list arguments;
va_start(arguments, format);
bufsize = 1024; /* fixed size */
bufsize = sizeof(arguments) + sizeof(format) + 1024;
buffer = (char*)malloc(bufsize);
status = vsprintf(buffer, format, arguments);
fputs(buffer, stdout);
va_end(arguments);
return status;
}
int main()
{
const char* name = "World";
test("Hello, %s\n", name);
return 0;
}
However, I don't think this is the way to go... so, how would I calculate the required buffersize properly here?
If you have vsnprintf available to you, I would make use of that. It prevents buffer overflow since you provide the buffer size, and it returns the actual size needed.
So allocate your 1K buffer then attempt to use vsnprintf to write into that buffer, limiting the size. If the size returned was less than or equal to your buffer size, then it's worked and you can just use the buffer.
If the size returned was greater than the buffer size, then call realloc to get a bigger buffer and try it again. Provided the data hasn't changed (e.g., threading issues), the second one will work fine since you already know how big it will be.
This is relatively efficient provided you choose your default buffer size carefully. If the vast majority of your outputs are within that limit, very few reallocations has to take place (see below for a possible optimisation).
If you don't have an vsnprintf-type function, a trick we've used before is to open a file handle to /dev/null and use that for the same purpose (checking the size before outputting to a buffer). Use vfprintf to that file handle to get the size (the output goes to the bit bucket), then allocate enough space based on the return value, and vsprintf to that buffer. Again, it should be large enough since you've figured out the needed size.
An optimisation to the methods above would be to use a local buffer, rather than an allocated buffer, for the 1K chunk. This avoids having to use malloc in those situations where it's unnecessary, assuming your stack can handle it.
In other words, use something like:
int test(const char* format, ...)
{
char buff1k[1024];
char *buffer = buff1k; // default to local buffer, no malloc.
:
int need = 1 + vsnprintf (buffer, sizeof (buff1k), format, arguments);
if (need > sizeof (buff1k)) {
buffer = malloc (need);
// Now you have a big-enough buffer, vsprintf into there.
}
// Use string at buffer for whatever you want.
...
// Only free buffer if it was allocated.
if (buffer != buff1k)
free (buffer);
}

Resources