API design - allocate output? - c

Is it a good idea for C API functions to allocate their output, or to have the user specify the output buffer? For example:
BOOL GetString(
PWSTR *String
);
...
PWSTR string;
GetString(&string);
Free(string);
vs
BOOL GetString(
PWSTR Buffer,
ULONG BufferSize,
PULONG RequiredBufferSize
);
...
// A lot more code than in the first case
More specifically I'm wondering why the Win32 API primarily uses the second case (e.g. GetWindowText, LookupAccountSid). If an API function knows how big the output is, why have the user try to guess the output size? I can't find any information on why the second case would be used.
Also: the LookupAccountSid example is particularly bad. Internally it uses the LSA API, which allocates the output for the caller. Then LookupAccountSid gets the user to allocate a buffer (and guess the correct buffer size) when it could just return the output from LSA! Why?

The Win32 API does not pre-allocate buffers because it wants to give the calling code the choice of how to provide the buffer. It allows for them to provide stack and a variety of heap based buffers. There are several places where the maximum size of the buffer is known ahead of time and developers want the simplicity of using a stack based buffer.
The file system is the best example as paths won't exceed MAX_PATH. So rather than allocate + free. The developer simply declares a stack based buffer.
The advantage to having the C API allocate memory is that it simplifies the calling pattern. The downside of the Win32 pattern is that most times you end up calling the API twice. The first time to determine the size of the buffer, then the second time with a buffer of appropriate size. With an API allocated buffer only one call is needed.
The downside though is that you take away the choice of allocation from the caller. Additionally you must communicate your choice in order for them to properly free the API (windows for instance can allocate from several different places).

The second approach has some advantages like
It lets callers manage the lifetime of memory allocations
It lets callers to reuse allocated memory for different calls that follow that same pattern
It lets callers to decide which
buffer to provide e.g. stack or heap.

Related

Is there a way to either query what would realloc do, or prevent it from copying all memory on Windows and Linux?

I'm implementing a container similar to std::vector from C++. It has a buffer with associated capacity (memory which is reserved for this container) and size (actual size of the container).
When the user adds elements and size needs to exceed the capacity, I use realloc for the new capacity.
There is a reserve function for the container which sets the capacity in case the user knows it beforehand and doesn't want to risk allocating memory when filling the container with data.
Thus invariants might exist where the size is small (say zero) and the capacity is big (say 1MB). Then if the user calls reserve(even_bigger_capacity), what am I supposed to do?
I can just call realloc, but if realloc does end up allocating a new memory block, it will copy 1MB of useless bytes into it.
I can have a constant: WASTEFUL_COPY_BYTES and check capacity - size > WASTEFUL_COPY_BYTES, and manually call malloc and memcpy and copy only what's needed, in case it's true, and only call realloc if the difference is small, but in this case I'm missing opportunities to use realloc where it would return the same address.
Basically I need something like bool try_realloc(void *old_addr, size_t new_size) which would return true if realloc would return the same address, but won't try to allocate a new block and copy stuff.
...or something like void* part_realloc(void* old_addr, size_t new_size, size_t relevant_size) which would only copy relevant_size bytes into the new block, if it ends up allocating one.
I'm sure there are platform-specific ways of implementing both of these functions, so my question is: is there a library with such functions which works on major platforms or, if not, how would I go about implementing something like this at least for Windows and Linux?
So, on Windows, _expand is exactly what I need.
On Linux, things don't look as simple. I'll need to deep dive in glibc. Perhaps malloc_usable_size will be helpful.
edit: It's not (well not very much. It returns too few bytes more than the allocation size). It seems there is no way to implement this with glibc's public interface. The only way is to look through glibc's code, duplicate the data structures and use the chunk before the returned memory block, which is a Bad Idea™
I'll report further findings here unless someone provides a better answer in the meantime

Is it bad practice to hide memory allocations in functions?

Should I expect the user to provide a memory chunk of sufficient size, say, for copying a file into a buffer? Or should I allocate the memory myself, and expect the user to free it when they're done? For example, the function strdup() allocates memory itself, but the function fread() expects only a buffer of sufficient size.
It depends - I've seen C APIs use all kind of patterns for this, such as:
functions that require the buffer and buffer size to be provided, and return the required size (so that you can adjust the buffer size if it was truncated); many of these allow passing NULL as a buffer if you are just asking how big the buffer should be; this allows the caller to use an existing buffer or to allocate an appropriately sized one, although with two calls;
separate functions to obtain needed size and to fill the buffer; same as above, but with a clearer interface;
functions that require buffer and buffer size, but can allocate the buffer themselves if NULL is passed as buffer; maximum flexibility and terseness, but the function signature can get confusing;
functions that just return a newly allocated string; simple to use and avoids bugs arising from unguarded truncation, but inflexible if performance is a concern; also, requires the caller to remember to free the returned value, which is avoided in the cases above if using a stack-allocated buffer;
functions that return a pointer to a static buffer, and then the caller is responsible to do whatever with it; extremely easy to use, extremely easy to misuse; requires care in case of multithreading (needs thread local storage) and if reentrancy is a concern.
The last one is generally a bad idea - it poses problems with reentrancy and thread safety; the one before it can be used but may pose efficiency problems - I generally don't want to waste time in allocations if I have already a buffer big enough. All the others are generally pretty much OK.
But besides the specifics of the interface, the most important point if you allocate stuff and/or return pointers is to clearly document who owns the pointed memory - is it a static object in your library? Is it a pointer to some internal of an object provided by the caller? Is it dynamically allocated stuff? Is the caller responsible for freeing it? Is it just the buffer that was provided as argument?
Most importantly, in case you allocated stuff, always specify how to deallocate it; notice that, if you are building a library that may be compiled as a dll/so, it's a good idea to provide your own deallocation function (even if it's just a wrapper around free) to avoid mismatches between different versions of the C runtime running in the same process. Also, it avoids tying your code to the C library allocator - today it may be fine, tomorrow it may turn out that using a custom allocator may be a better idea.
Is it bad practice to hide memory allocations in functions?
Sometimes.
An answer to show when code can be abused to detail one of the pitfalls of allowing a function total freedom in memory allocation.
A classic case occurs when the function itself determines the size needed, so the calling code lacks the information needed to to provide the memory buffer beforehand.
This is the case with getline() where the stream content throttles the size of the allocation. The problem with this, especially when the stream is stdin, is that the control over memory allocation is given to external sources and not limited by the calling code - the program. External input may overwhelm memory space - a hack.
With a modified function, such as ssize_t getline_limit(char **lineptr, size_t *n, FILE *stream, size_t limit);, the function could still provide a right-size allocation, yet still prevent a hacker abuse.
#define LIMIT 1000000
char *line = NULL;
size_t len = 0;
ssize_t nread;
while ((nread = getline_limit(&line, &len, stdin, LIMIT)) != -1) {
An example where this is not an issue would be an allocation with a well bounded use.
// Convert `double` to its decimal character representation allocating a right-size buffer
// At worst a few thousand characters
char *double_to_string_exact_alloc(int x)
Functions that perform memory allocation need some level of control to prevent unlimited memory allocation either with a specific parameter or by nature of the task.
C library functions refrain from returning allocated memory. That's at least part of the reason why strdup is not part of the standard library, along with a popular scanf extension for reading C strings of unlimited length.
Your library could choose either way. Using pre-allocated buffers is more flexible, because it lets users pass you statically allocated buffers. This flexibility comes at a cost, because user's code becomes more verbose.
If you choose to allocate memory for a custom struct dynamically, it is a good idea to make a matching function for deallocating the struct once it becomes unnecessary to the user.

C Design: Pass memory address or return

In some functions (such as *scanf variants) there is a argument that takes a memory space for the result. You could also write the code where it returns an address. What are the advantages, why design the function in such a weird way?
Example
void process_settings(char* data)
{
.... // open file and put the contents in the data memory
return;
}
vs
char* process_settings()
{
char* data = malloc(some_size);
.... // open file and load it into data memory
return data;
}
The benefit is that you can reserve the return value of the function for error checking, status indicators, etc, and actually send back data using the output parameter. In fact, with this pattern, you can send back any amount of data along with the return value, which could be immensely useful. And, of course, with multiple calls to the function (for example, calling scanf in a loop to validate user input), you don't have to malloc every time.
One of the best examples of this pattern being used effectively is the function strtol, which converts a string to a long.
The function accepts a pointer to a character as one of its parameters. It's common to declare this char locally as endptr and pass in its address to the function. The function will return the converted number if it was able to, but if not, it'll return 0 to indicate failure but also set the character pointer passed in to the non-digit character it encountered that caused the failure.
You can then report that the conversion failed on that particular character.
This is better design than using global error indicators; consider multithreaded programs. It likely isn't reasonable to use global error indicators if you'll be calling functions that could fail in several threads.
You mention that a function should be responsible for its own memory. Well, scanf doesn't exist to create the memory to store the scanned value. It exists to scan a value from an input buffer. The responsibilities of that function are very clear and don't include allocating the space.
It's also not unreasonable to return a malloc'd pointer. The programmer should be prudent, though, and free the returned pointer when they're done using it.
The decision of using one method instead of another depends on what you intend to do.
Example
If you want to modify an array inside a function an maintain the modification in the original array, you should use your first example.
If you are creating your own data structure, you have to deal with all the operations. And if you want to create a new struct you should allocate memory inside the function and return the pointer. The second example.
If you want to "return" two values from a function, like a vector and the length of the vector, and you don't want to create a struct for this, you could return the pointer of the vector and pass an int pointer as an argument of the function. That way you could modify the value of the int inside the function and you use it outside too.
char* return_vector_and_length(int* length);
Let’s say that, for example, you wanted to store process settings in a specific place in memory. With the first version, you can write this as, process_settings(output_buffer + offset);. How would you have to do it in you only had the second version? What would happen to performance if it were a really big array? Or what if, let’s say, you’re writing a multithreaded application where having all the threads call malloc() all the time would make them fight over the heap and serialize the program, so you want to preallocate all your buffers?
Your intuition is correct in some cases, though: on modern OSes that can memory-map files, it does turn out to be more efficient to return a pointer to the file contents than the way the standard library was historically written, and this is how glib does it. Sometimes allocating all your buffers on the heap helps avoid buffer overflows that smash the stack.
An important point is that, if you have the first version, you can trivially get the second one by calling malloc and then passing the buffer as the dest argument. But, if you have only the second, you can’t implement the first without copying the whole array.

Does libuv provide any facilities to attach a buffer to a connection and re use it

I am evaluating libuv as a library for a C/c++ server that I am writing. The protocol is length prefixed so as soon as I can read a 32 bit integer from the stream I should be able to tell what size of buffer I should allocate. The documentation says that the uv_read_start function might be called multiple times.
UV_EXTERN int uv_read_start(uv_stream_t*, uv_alloc_cb alloc_cb,
uv_read_cb read_cb);
Since I am using a length prefixed protocol, once I know the right size of the buffer I would like to allocate it and re use it for subsequent reads till I have received all my bytes. Is there an easy way to do this with libuv? Right now it seems like the uv_alloc_cb function has to take care of this. Can I associate a buffer with my stream object instead of putting it in a map or something?
Since I am using a length prefixed protocol, I would not like to allocate a buffer on the heap at all till I can read the first 4 bytes (32 bits). Is it possible for me to allocate on the stack a buffer of size 4 and have the uv_read_cb function actually do the heap allocation? Is the uv_read_cb function invoked synchronously as part of the uv_read_start function? If it is then seems like I should be able to allocate on the stack when I know that I don't already have a buffer attached to my stream.
Answering my own question. I found the answers on the libuv mailing list here: https://groups.google.com/forum/#!topic/libuv/fRNQV_QGgaA
Copying the details here if the link becomes unavailable:
Attaching your own data structure to a handle:
The handle has a void* data field that is yours to use. You can
make it point it to an auxiliary structure where you store the length
and the buffer.
Alternatively, you can embed the uv_tcp_t in another structure, then
look up the embedding structure with container_of. It's not a
standard C macro but you can find its definition and usage examples in
the libuv/ source tree. Its benefit is that it just does some pointer
arithmetic, it saves you from another level of pointer indirection.
Stack allocation for the receiving buffer:
No, that's not possible. The proper way of thinking about it is that
your alloc_cb returns a buffer that libuv will fill with data sometime
in the future. The stress is on "sometime" because there are no
guarantees when that will happen; it may be immediate, it may be
seconds (or minutes) away.

How to implement deterministic malloc

Say I have two instances of an application, with the same inputs and same execution sequence. Therefore, one instance is a redundant one and is used for comparing data in memory with the other instance, as a kind of error detection mechanism.
Now, I want all memory allocations and deallocations to happen in exactly the same manner in the two processes. What is the easiest way to achieve that? Write my own malloc and free? And what about memories allocated with other functions such as mmap?
I'm wondering what you are trying to achieve. If your process is deterministic, then the pattern of allocation / deallocation should be the same.
The only possible difference could be the address returned by malloc. But you should probably not depend on them (the easiest way being not using pointers as key map or other data structure). And even then, there should only be difference if the allocation is not done through sbrk (the glibc use anonymous mmap for large allocations), or if you are using mmap (as by default the address is selected by the kernel).
If you really want to have exactly the same address, one option is to have a large static buffer and to write a custom allocator that does use memory from this buffer. This has the disadvantage of forcing you to know beforehand the maximum amount of memory you'll ever need. In a non-PIE executable (gcc -fno-pie -no-pie), a static buffer will have the same address every time. For a PIE executable you can disable the kernel's address space layout randomization for loading programs. In a shared library, disabling ASLR and running the same program twice should lead to the same choices by the dynamic linker for where to map libraries.
If you don't know before hand the maximum size of the memory you want to use, or if you don't want to recompile each time this size increase, you can also use mmap to map a large anonymous buffer at a fixed address. Simply pass the size of the buffer and the address to use as parameter to your process and use the returned memory to implement your own malloc on top of it.
static void* malloc_buffer = NULL;
static size_t malloc_buffer_len = 0;
void* malloc(size_t size) {
// Use malloc_buffer & malloc_buffer_len to implement your
// own allocator. If you don't read uninitialized memory,
// it can be deterministic.
return memory;
}
int main(int argc, char** argv) {
size_t buf_size = 0;
uintptr_t buf_addr = 0;
for (int i = 0; i < argv; ++i) {
if (strcmp(argv[i], "--malloc-size") == 0) {
buf_size = atoi(argv[++i]);
}
if (strcmp(argv[i], "--malloc-addr") == 0) {
buf_addr = atoi(argv[++i]);
}
}
malloc_buffer = mmap((void*)buf_addr, buf_size, PROT_WRITE|PROT_READ,
MAP_FIXED|MAP_PRIVATE, 0, 0);
// editor's note: omit MAP_FIXED since you're checking the result anyway
if (malloc_buffer == MAP_FAILED || malloc_buffer != (void*)but_addr) {
// Could not get requested memory block, fail.
exit(1);
}
malloc_size = buf_size;
}
By using MAP_FIXED, we are telling the kernel to replace any existing mappings that overlap with this new one at buf_addr.
(Editor's note: MAP_FIXED is probably not what you want. Specifying buf_addr as a hint instead of NULL already requests that address if possible. With MAP_FIXED, mmap will either return an error or the address you gave it. The malloc_buffer != (void*)but_addr check makes sense for the non-FIXED case, which won't replace an existing mapping of your code or a shared library or anything else. Linux 4.17 introduced MAP_FIXED_NOREPLACE which you can use to make mmap return an error instead of memory at the wrong address you don't want to use. But still leave the check in so your code works on older kernels.)
If you use this block to implement your own malloc and don't use other non-deterministic operation in your code, you can have complete control of the pointer values.
This suppose that your pattern usage of malloc / free is deterministic. And that you don't use libraries that are non-deterministic.
However, I think a simpler solution is to keep your algorithms deterministic and not to depend on addresses to be. This is possible. I've worked on a large scale project were multiple computer had to update state deterministically (so that each program had the same state, while only transmitting inputs). If you don't use pointer for other things than referencing objects (most important things is to never use pointer value for anything, not as a hash, not as a key in a map, ...), then your state will stay deterministic.
Unless what you want to do is to be able to snapshot the whole process memory and do a binary diff to spot divergence. I think it's a bad idea, because how will you know that both of them have reached the same point in their computation? It is much more easier to compare the output, or to have the process be able to compute a hash of the state and use that to check that they are in sync because you can control when this is done (and thus it become deterministic too, otherwise your measurement is non-deterministic).
What is not deterministic is not only malloc but mmap (the basic syscall to get more memory space; it is not a function, it is a system call so is elementary or atomic from the application's point of view; so you cannot rewrite it within the application) because of address space layout randomization on Linux.
You could disable it with
echo 0 > /proc/sys/kernel/randomize_va_space
as root, or thru sysctl.
If you don't disable address space layout randomization you are stuck.
And you did ask a similar question previously, where I explained that your malloc-s won't always be deterministic.
I still think that for some practical applications, malloc cannot be deterministic. Imagine for instance a program having an hash-table keyed by the pid-s of the child processes it is launching. Collision in that table won't be the same in all your processes, etc.
So I believe you won't succeed in making malloc deterministic in your sense, whatever you'll try (unless you restrict yourself to a very narrow class of applications to checkpoint, so narrow that your software won't be very useful).
Simply put, as others have stated: if the execution of your program's instructions is deterministic, then memory returned by malloc() will be deterministic. That assumes your system's implementation doesn't have some call to random() or something to that effect. If you are unsure, read the code or documentation for your system's malloc.
This is with the possible exception of ASLR, as others have also stated. If you don't have root privileges, you can disable it per-process via the personality(2) syscall and the ADDR_NO_RANDOMIZE parameter. See here for more information on the personalities.
Edit: I should also say, if you are unaware: what you're doing is called bisimulation and is a well-studied technique. If you didn't know the terminology, it might help to have that keyword for searching.
When writing high-reliability code, the usual practise is to avoid malloc and other dynamic memory allocation. A compromise sometimes used is to do all such allocation only during system initialisation.
You can used shared memory to store your data. It will accessible from both processes and you can fill it in a deterministic way.

Resources