Puzzling behavior, malloc and free(), with libuv

Puzzling behavior, malloc and free(), with libuv - c

Using the sample code to learn about libuv I have come across a side effect I don't understand for sure. The code uses malloc() to obtain memory to store data from a client on the network and then send the same data back, just echos. It then uses free to release the memory. This repeats over and over through a call back loop. The line of code getting the memory is:
uv_write_t *req = (uv_write_t *) malloc(sizeof(uv_write_t));
and the lines freeing the memory are:
free((char*) req->data);
free(req);
However if you input a long string such as "Whats the word on the street?" to be echoed and then in put shorter strings like "Hi" fragments of the older string will reappear after the shorter string is echoed back. For instance output can be like this:
Whats the word on the street?
hi
hi
howdy
howdy
he word on the street?
Since the memory is being freed I am uncertain why the older fragment is showing back up. My thoughts on the subject is that either there is something I don't understand about malloc and free() or there is a bug in the library in how it determines the size needed for the incoming data and after using a longer string I am getting garbage as part of a memory block that was to big. If that is the case then the fact it is a fragment of my earlier input is just happenstance. Is this the likely reason, or am I missing something? Is there any other info. that I should include to clarify it?

Implementations of malloc() will vary, but its safe to assume assume that calls to malloc() can return a pointer to a previously free()-ed chunks of memory and that the memory returned will not have been zeroed out. In other words, its perfectly normal for malloc() to give us a pointer to data that contains previously initialized data.
That said, I suspect the root problem here will be an unterminated string, which was probably an artifact of the way you are serializing the string. For example, if you are merely writing strlen(str) bytes from the client, you are not writing a NULL. As a result when the server receives the message it will have an un-terminated string. If this is how you plan to pass the string and you plan to treat it as a normal null-terminated string, the server will need to copy the data into a buffer large enough to accomodate the string plus the additional NULL char.
So why then are you seeing fragments of past messages? Probably dumb luck. If this is a really simple app, its very possible for malloc() to return a chunk of memory that overlaps with the previous request.
So then why am I getting such clean output, shouldn't I see tons of garbled data, or a segfault for my string operations walking off into infinity? Again, dumb luck. Keep in mind that when the kernel first gives your application a page of memory, it will have first zeroed-out page out (this is done for security reasons). So, even though you might not have terminated the string, the page of heap memory where your string resides might be sitting in a relatively pristine zeroed-out state.

uv_write_t *req is not the data to be sent or received. It's just something like a handle to a write request.
Neither is req->data. That is a pointer to arbitrary private data for you. It might be used for example if you wanted to pass around some data related to the connection.
The actual payload data are sent through a write buffer (uv_buf_t) and received into a buffer that is allocated when a read request is served. That's why the read function wants an alloc parameter. Later that buffer is passed to the read callback.
The freeing of req->data assumes that 'data' pointed to some private data, typically a structure, that was malloc'd (by you).
As a rule of thumb, a socket is represented by a uv_xxx_t while reading and writing use 'request' structures.
Writing a server (a typical uv use case) one doesn't know how many connections there will be, hence everything is allocated dynamically.
To make your life easier you might think in terms of pairs (open/close or start/done). So when accepting a new connection you start a cycle and allocate the client. When closing that connection you free it. When writing, you allocate the request as well as the payload data buffer. When done writing, you free them. When reading you allocate a read request and the payload data buffer is allocated behind the scene (through the alloc callback) when done reading (and having copied the payload data) you free them both.
There are ways to get the job done without all those malloc/free pairs (which aren't glourious performance wise) but for a novice I would agree with the uv docs; you should definitely start with the malloc/free route.
To give you an idea: I pre-allocate everything for some ten or hundred thousand connections but that brings some administration and trickery with it, e.g. faking in the alloc call back to merely assign one of your pre-allocated buffers.
If asked to guess I'd suggest that avoiding malloc/free is only worth the trouble beyond well over 5k - 10k connections at any point in time.

Related

How did maloc manage to grab the memory that was already allocated in SSL Heartbeat?

The recent Heartbleed vulnerability is caused by this particular unchecked execution:
buffer = OPENSSL_malloc(1 + 2 + payload + padding);
(according to http://java.dzone.com/articles/everything-you-need-know-about-2)
But how could malloc at any time grab the memory that is already dished out somewhere else. Even though payload and padding variables are filled out by the user values, but it seems to me that these would only be able to cause an out of memory error (with the very large value), and not the shift in address space in order to read the server's RAM outside of this very buffer.

OpenSSL uses its own memory allocator (for speed reasons so they say). Therefore memory never gets passed back to the operation system. Instead they pool unused buffers and re-use them.
If you call OPENSSL_malloc the chances are almost 100% that the buffer you get contains data previously used by OpenSSL. This could be encrypted data, unencrypted data or even the private encryption keys.

It doesn't. It grabs a small block of memory and then proceeds to copy a much larger amount of data out of it, reading past the end of the malloc'd block into whatever happens to lie after it on the heap -- other malloc'd blocks (both alive and dead) that have been used for other things (other clients or system stuff) -- and copies that raw data to the attacker.

read() system call does a copy of data instead of passing the reference

The read() system call causes the kernel to copy the data instead of passing the buffer by reference. I was asked the reason for this in an interview. The best I could come up with were:
To avoid concurrent writes on the same buffer across multiple processes.
If the user-level process tries to access a buffer mapped to kernel virtual memory area it will result in a segfault.
As it turns out the interviewer was not entirely satisfied with either of these answers. I would greatly appreciate if anybody could elaborate on the above.

A zero copy implementation would mean the user level process would have to be given access to the buffers used internally by the kernel/driver for reading. The user would have to make an explicit call to the kernel to free the buffer after they were done with it.
Depending on the type of device being read from, the buffers could be more than just an area of memory. (For example, some devices could require the buffers to be in a specific area of memory. Or they could only support writing to a fixed area of memory be given to them at startup.) In this case, failure of the user program to "free" those buffers (so that the device could write more data to them) could cause the device and/or its driver to stop functioning properly, something a user program should never be able to do.

The buffer is specified by the caller, so the only way to get the data there is to copy them. And the API is defined the way it is for historical reasons.
Note, that your two points above are no problem for the alternative, mmap, which does pass the buffer by reference (and writing to it than writes to the file, so you than can't process the data in place, while many users of read do just that).

I might have been prepared to dispute the interviewer's assertion. The buffer in a read() call is supplied by the user process and therefore comes from the user address space. It's also not guaranteed to be aligned in any particular way with respect to page frames. That makes it tricky to do what is necessary to perform IO directly into the buffer ie. map the buffer into the device driver's address space or wire it for DMA. However, in limited circumstances, this may be possible.
I seem to remember the BSD subsystem used by Mac OS X used to copy data between address spaces had an optimisation in this respect, although I may be completely mistaken.

Is there a way to pre-emptively avoid a segfault?

Here's the situation:
I'm analysing a programs' interaction with a driver by using an LD_PRELOADed module that hooks the ioctl() system call. The system I'm working with (embedded Linux 2.6.18 kernel) luckily has the length of the data encoded into the 'request' parameter, so I can happily dump the ioctl data with the right length.
However quite a lot of this data has pointers to other structures, and I don't know the length of these (this is what I'm investigating, after all). So I'm scanning the data for pointers, and dumping the data at that position. I'm worried this could leave my code open to segfaults if the pointer is close to a segment boundary (and my early testing seems to show this is the case).
So I was wondering what I can do to pre-emptively check whether the current process owns a particular offset before trying to dereference? Is this even possible?
Edit: Just an update as I forgot to mention something that could be very important, the target system is MIPS based, although I'm also testing my module on my x86 machine.

Open a file descriptor to /dev/null and try write(null_fd, ptr, size). If it returns -1 with errno set to EFAULT, the memory is invalid. If it returns size, the memory is safe to read. There may be a more elegant way to query memory validity/permissions with some POSIX invention, but this is the classic simple way.

If your embedded linux has the /proc/ filesystem mounted, you can parse the /proc/self/maps file and validate the pointer/offsets against that. The maps file contains the memory mappings of the process, see here

I know of no such possibility. But you may be able to achieve something similar. As man 7 signal mentions, SIGSEGV can be caught. Thus, I think you could
Start with dereferencing a byte sequence known to be a pointer
Access one byte after the other, at some time triggering SIGSEGV
In SIGSEGV's handler, mark a variable that is checked in the loop of step 2
Quit the loop, this page is done.
There's several problems with that.
Since several buffers may live in the same page, you might output what you think is one buffer that are, in reality, several. You may be able to help with that by also LD_PRELOADing electric fence which would, AFAIK cause the application to allocate a whole page for every dynamically allocated buffer. So you would not output several buffers thinking it is only one, but you still don't know where the buffer ends and would output much garbage at the end. Also, stack based buffers can't be helped by this method.
You don't know where the buffers end.
Untested.

Can't you just check for the segment boundaries? (I'm guessing by segment boundaries you mean page boundaries?)
If so, page boundaries are well delimited (either 4K or 8K) so simple masking of the address should deal with it.

malloc and obtaining recently freed memory

I am allocating the array and freeing it every callback of an audio thread. The main user thread (a web browser) is constantly allocating and deallocating memory based on user input. I am sending the uninited float array to the audio card. (example in my page from my profile.) The idea is to hear program state changes.
When I call malloc(sizeof(float)*256*13) and smaller i get an array filled with a wide range of floats which have a seemingly random distribution. It is not right to call it random - presumably this comes from whatever the memory block previously held. This is the behavior I expected and want to exploit. However when I do malloc(sizeof(float)*256*14) and larger, I get an array filled only with zeros. I would like to know why this cliff exists and if theres something I can do to get around it. I know it is undefined behavior per the standard, but I'm hoping someone that knows the implementation of malloc on some system might have an explanation.
Does this mean malloc is also memsetting the block to zero for larger sizes? This would be surprising since it wouldn't be efficient. Even if there are more chunks of memory zeroed out, I'd expect something to happen sometimes, since the arrays are constantly changing.
If possible I would like to be able to obtain chunks of memory that are reallocated over recently freed memory, so any alternatives would be welcomed.
I guess this is a strange question for some because my goal is to explore undefined behavior and use bad programming practices deliberately, but this is the application I am interested in making, so please bear with the usage of uninited arrays. I know the behavior of such usage is undefined, so please bear with me and don't tell me not to do it. I'm developing on a mac 10.5.

Most likely, the larger allocations result in the heap manager directly requesting pages of virtual address space from the kernel. Freeing will return that address space back to the kernel. The kernel must zero all pages that are allocated for a process - this is to prevent data leaking from one process to another.
Smaller allocations are handled by the user-mode heap manager within the process by taking these larger page allocations from the kernel, carving them up into smaller blocks, and reusing blocks on subsequent allocations. These do not need to be zero-initialized, since the memory contents always comes from your own process.

What you'll probably find is that previous requests could be filled using smaller blocks joined together. But when you request the bigger memory, then the existing free memory probably can't handle that much and flips some inbuilt switch for a request direct from the OS.

Thread safe char string in C

In C:
If I have 3 threads,
2 threads that are appending strings to a global char string (char*),
and 1 thread that is reading from that char string.
Let's say that the 2 threads are appending about 8 000 strings per second each and the 3rd thread is reading pretty often too.
Is there any possibility that they will append at exactly the same time and overwrite each other's data or read at the same time and get an incomplete string?

Yes, this will get corrupted pretty quickly
You should protect access to this string with a mutex or read/write locking mechanism.
I'm not sure what platform you're on but have a look at the pthreads library if you're on a *nix platform.
I don't develop for windows, so I can't point you at any threading capabilities (though I know there's plenty of good threading API in Win32
Edit
#OP have you considered the memory issues of appending 8000 strings (you don't state how large each string is) per second. You're going to run out of memory pretty quickly if you're never removing data from your global string. You might want to consider bounding the size of this string somehow, and also setting up some kind of system to remove data from your string (the reader thread would be the best place for this). If you're already doing that, then ignore the above.

Is there any possibility that they
will append exact on the same time and
overwrite each others data or read on
the same time and get an incomplete
string?
When dealing with concurrent issues, you must always protect your data. You can never leave this kind of stuff to chance. Even if there's 0.1% chance of trouble, it will happen.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight