How to implement the c malloc/realloc functions properly? - c

I am writing my own OS and had to implement my own malloc realloc functions. However I think that what I have written may not be safe and may also cause a memory leak because the variable isn't really destroyed, its memory is set to zero, but the variable name still exists. Could someone tell me if there are any vulnerabilities in this code? The project will be added to github soon as its finished under user subado512.
Code:
void * malloc(int nbytes)
{
char variable[nbytes];
return &variable;
}
void * free(string s) {
s= (string)malloc(0);
return &s;
}
void memory_copy(char *source, char *dest, int nbytes) {
int i;
for (i = 0; i < nbytes; i++) {
*(dest + i) = *(source + i); // dest[i] = source[i]
}
}
void *realloc(string s,uint8_t i) {
string ret;
ret=(string)malloc(i);
memory_copy(s,ret,i);
free(s);
return &ret;
}
Context in which code is used : Bit of pseudo code to increase readability
string buffstr = (string) malloc(200);
uint8_t i = 0;
while(reading)
{
buffstr=(string)realloc(buffstr,i+128);
buffstr[i]=readinput();
}

The behaviour on your using the pointer returned by your malloc is undefined: you are returning the address of an array with automatic storage duration.
As a rough start, consider using a static char array to model your memory pool, and return segments of this back to the caller; building up a table of that array that is currently in use. Note that you'll have to do clever things with alignment here to guarantee that the returned void* meets the alignment requirements of any type. free will then be little more than your releasing a record in that table.
Do note that the memory management systems that a typical C runtime library uses are very sophisticated. With that in mind, do appreciate that your undertaking may be little more than a good programming exercise.

Related

Memory Allocation, Recursive Function and Pure C [duplicate]

I know that on your hard drive, if you delete a file, the data is not (instantly) gone. The data is still there until it is overwritten. I was wondering if a similar concept existed in memory. Say I allocate 256 bytes for a string, is that string still floating in memory somewhere after I free() it until it is overwritten?
Your analogy is correct. The data in memory doesn't disappear or anything like that; the values may indeed still be there after a free(), though attempting to read from freed memory is undefined behaviour.
Generally, it does stay around, unless you explicitly overwrite the string before freeing it (like people sometimes do with passwords). Some library implementations automatically overwrite deallocated memory to catch accesses to it, but that is not done in release mode.
The answer depends highly on the implementation. On a good implementation, it's likely that at least the beginning (or the end?) of the memory will be overwritten with bookkeeping information for tracking free chunks of memory that could later be reused. However the details will vary. If your program has any level of concurrency/threads (even in the library implementation you might not see), then such memory could be clobbered asynchronously, perhaps even in such a way that even reading it is dangerous. And of course the implementation of free might completely unmap the address range from the program's virtual address space, in which case attempting to do anything with it will crash your program.
From a standpoint of an application author, you should simply treat free according to the specification and never access freed memory. But from the standpoint of a systems implementor or integrator, it might be useful to know (or design) the implementation, in which case your question is then interesting.
If you want to verify the behaviour for your implementation, the simple program below will do that for you.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* The number of memory bytes to test */
#define MEM_TEST_SIZE 256
void outputMem(unsigned char *mem, int length)
{
int i;
for (i = 0; i < length; i++) {
printf("[%02d]", mem[i] );
}
}
int bytesChanged(unsigned char *mem, int length)
{
int i;
int count = 0;
for (i = 0; i < MEM_TEST_SIZE; i++) {
if (mem[i] != i % 256)
count++;
}
return count;
}
main(void)
{
int i;
unsigned char *mem = (unsigned char *)malloc(MEM_TEST_SIZE);
/* Fill memory with bytes */
for (i = 0; i < MEM_TEST_SIZE; i++) {
mem[i] = i % 256;
}
printf("After malloc and copy to new mem location\n");
printf("mem = %ld\n", mem );
printf("Contents of mem: ");
outputMem(mem, MEM_TEST_SIZE);
free(mem);
printf("\n\nAfter free()\n");
printf("mem = %ld\n", mem );
printf("Bytes changed in memory = %d\n", bytesChanged(mem, MEM_TEST_SIZE) );
printf("Contents of mem: ");
outputMem(mem, MEM_TEST_SIZE);
}

How does MRI Ruby store the contents of a String?

Primer: This question is quite long, because I want to give an overview of my current understanding of the inner mechanisms of MRI and how I came to my conclusions. I want to understand the code better, so please correct me if any assumption I'm making is wrong.
I'm trying to find out where MRI Ruby stores the data part (aka the contents) of a String, because I'd like to create String objects which reuse memory allocated by another binary (same allocator of course).
Here's what I know so far:
RString: internal representation of a String.
struct RString {
struct RBasic basic;
union {
struct {
long len;
char *ptr;
union {
long capa;
VALUE shared;
} aux;
} heap;
char ary[RSTRING_EMBED_LEN_MAX + 1];
} as;
};
reference
From the above snippet I conclude that there are 2 ways the data can be stored:
on the heap via the heap struct (ptr points to data)
in the ary char array directly (probably some optimization)
I'm only interested in the heap case.
str_new0() seems to be the most common way to create a String from a pointer to some string data and a length.
static VALUE
str_new0(VALUE klass, const char *ptr, long len, int termlen)
{
VALUE str;
if (len < 0) {
rb_raise(rb_eArgError, "negative string size (or size too big)");
}
RUBY_DTRACE_CREATE_HOOK(STRING, len);
str = str_alloc(klass);
if (len > RSTRING_EMBED_LEN_MAX) {
RSTRING(str)->as.heap.aux.capa = len;
RSTRING(str)->as.heap.ptr = ALLOC_N(char, len + termlen);
STR_SET_NOEMBED(str);
}
else if (len == 0) {
ENC_CODERANGE_SET(str, ENC_CODERANGE_7BIT);
}
if (ptr) {
memcpy(RSTRING_PTR(str), ptr, len);
}
STR_SET_LEN(str, len);
TERM_FILL(RSTRING_PTR(str) + len, termlen);
return str;
}
reference
Memory is allocated with the macro ALLOC_N which is an alias for RB_ALLOC_N which expands to ruby_xmalloc2() which calls objspace_xmalloc2() which calls objspace_xmalloc0().
Phew
static void *
objspace_xmalloc0(rb_objspace_t *objspace, size_t size)
{
void *mem;
size = objspace_malloc_prepare(objspace, size);
TRY_WITH_GC(mem = malloc(size));
size = objspace_malloc_size(objspace, mem, size);
objspace_malloc_increase(objspace, mem, size, 0, MEMOP_TYPE_MALLOC);
return objspace_malloc_fixup(objspace, mem, size);
}
reference
So here we are. TRY_WITH_GC seems to check if the allocation mem = malloc(size) succeeds and if not it tries again after a GC run I think.
#define TRY_WITH_GC(alloc) do { \
objspace_malloc_gc_stress(objspace); \
if (!(alloc) && \
(!garbage_collect_with_gvl(objspace, TRUE, TRUE, TRUE, GPR_FLAG_MALLOC) || /* full/immediate mark && immediate sweep */ \
!(alloc))) { \
ruby_memerror(); \
} \
} while (0)
reference
Here's the first thing I'm unsure about: It seems to malloc just some memory (important: not in objspace). Is this the case? I don't know if they overwrote malloc somewhere to allocate GC friendly or whatever.
OK after that they mutate objspace with objspace_malloc_increase() and friends. I don't understand what these functions do. They do not seem to store the pointer mem in objspace, but maybe I overlooked it. I need clarification here.
As noted in the beginning I want to write code that creates a Ruby String, which uses memory allocated by some other binary, eg. C via FFI, of course with the system allocator. Do I have to register my "foreign" memory via the objspace_* functions? If yes, how does that exactly work? And are there subtleties when it comes to freeing the memory again? (I guess the GC does that, but what conditions must be true for this to work?)
I hope my question is not too vague, I can ask more precisely if necessary!
Thanks in advance!

Fortune while returning C arrays?

I'm newbie with C and stunned with some magic while using following C functions. This code works for me, and prints all the data.
typedef struct string_t {
char *data;
size_t len;
} string_t;
string_t *generate_test_data(size_t size) {
string_t test_data[size];
for(size_t i = 0; i < size; ++i) {
string_t string;
string.data = "test";
string.len = 4;
test_data[i] = string;
}
return test_data;
}
int main() {
size_t test_data_size = 10;
string_t *test_data = generate_test_data(test_data_size);
for(size_t i = 0; i < test_data_size; ++i) {
printf("%zu: %s\n", test_data[i].len, test_data[i].data);
}
}
Why function generate_test_data works only when "test_data_size = 10", but when "test_data_size = 20" process finished with exit code 11? HOW does it possible?
This code will never work perfectly, it just happens to be working. In C, you have to manage the memory yourself. If you make a mistake, the program might continue to work... or something might scribble all over the memory you thought was yours. This often manifests itself as weird errors like you're having: it works when the length is X, but fails when the length is Y.
If you turn on -Wall, or if you're using clang even better -Weverything, you'll get a warning like this.
test.c:18:12: warning: address of stack memory associated with local variable 'test_data' returned
[-Wreturn-stack-address]
return test_data;
^~~~~~~~~
The two important kinds of memory in C are: stack and heap. Very basically, stack memory is only good for the duration of the function. Anything declared on the stack will be freed automatically when the function returns, sort of like local variables in other languages. The rule of thumb is if you don't explicitly allocate it, it's on the stack. string_t test_data[size]; is stack memory.
Heap memory you allocate and free yourself, usually using malloc or calloc or realloc or some other function doing this for you like strdup. Once allocated, heap memory stays around until it's explicitly deallocated.
Rule of thumb: heap memory can be returned from a function, stack memory cannot... well, you can but that memory slot might then be used by something else. That's what's happening to you.
So you need to allocate memory, not just once, but a bunch of times.
Allocate memory for the array of pointers to string_t structs.
Allocate memory for each string_t struct in the array.
Allocate memory for each char string (really an array) in each struct.
And then you have to free all that. Sound like a lot of work? It is! Welcome to C. Sorry. You probably want to write functions to allocate and free string_t.
static string_t *string_t_new(size_t size) {
string_t *string = malloc(sizeof(string_t));
string->len = 0;
return string;
}
static void string_t_destroy(string_t *self) {
free(self);
}
Now your test data function looks like this.
static string_t **generate_test_data_v3(size_t size) {
/* Allocate memory for the array of pointers */
string_t **test_data = calloc(size, sizeof(string_t*));
for(size_t i = 0; i < size; ++i) {
/* Allocate for the string */
string_t *string = string_t_new(5);
string->data = "test";
string->len = 4;
test_data[i] = string;
}
/* Return a pointer to the array, which is also a pointer */
return test_data;
}
int main() {
size_t test_data_size = 20;
string_t **test_data = generate_test_data_v3(test_data_size);
for(size_t i = 0; i < test_data_size; ++i) {
printf("%zu: %s\n", test_data[i]->len, test_data[i]->data);
}
/* Free each string_t in the array */
for(size_t i = 0; i < test_data_size; i++) {
string_t_destroy(test_data[i]);
}
/* Free the array */
free(test_data);
}
Instead of using pointers you could instead copy all the memory you use, which is sort of what you were previously doing. That's easier for the programmer, but inefficient for the computer. And if you're coding in C, it's all about being efficient for the computer.
Because the space for test_data in v1 gets created in the function, and that space gets reclaimed when the function returns (and can thus be used for other things); in v2, the space is set aside outside of the function, so doesn't get reclaimed.
Why function generate_test_data_v1 works only when "test_data_size = 10", but when "test_data_size = 20" process finished with exit code 11?
I see no reason why function generate_test_data_v1() should ever fail, but you cannot use its return value. It returns a pointer to an automatic variable belonging to the function's scope, and automatic variables cease to exist when the function to which they belong returns. Your program produces undefined behavior when it dereferences that pointer. I can believe that it appears to work as you intended for some sizes, but even in those cases the program is wrong.
Moreover, your program is very unlikely to be producing an exit code of 11, but it may well be terminating abruptly with a segmentation fault, which is signal 11.
And why generate_test_data_v2 works perfectly?
Function generate_test_data_v2() populates elements of an existing array belonging to function main(). That array is in scope for substantially the entire life of the program.

memory leak (free function not working)

I am facing memory leak problem with the below code
static char **edits1(char *word)
{
int next_idx;
char **array = malloc(edits1_rows(word) * sizeof (char *));
if (!array)
return NULL;
next_idx = deletion(word, array, 0);
next_idx += transposition(word, array, next_idx);
next_idx += alteration(word, array, next_idx);
insertion(word, array, next_idx);
return array;
}
static void array_cleanup(char **array, int rows) {
int i;
for (i = 0; i < rows; i++)
free(array[i]);
}
static char *correct(char *word,int *count) {
char **e1, **e2, *e1_word, *e2_word, *res_word = word;
int e1_rows, e2_rows,max_size;
e1_rows = edits1_rows(word);
if (e1_rows) {
e1 = edits1(word);
*count=(*count)*300;
e1_word = max(e1, e1_rows,*count);
if (e1_word) {
array_cleanup(e1, e1_rows);
free(e1);
return e1_word;
}
}
*count=(*count)/300;
if((*count>5000)||(strlen(word)<=4))
return res_word;
e2 = known_edits2(e1, e1_rows, &e2_rows);
if (e2_rows) {
*count=(*count)*3000;
e2_word = max(e2, e2_rows,*count);
if (e2_word)
res_word = e2_word;
}
array_cleanup(e1, e1_rows);
array_cleanup(e2, e2_rows);
free(e1);
free(e2);
return res_word;
}
I don’t know why free() is not working. I am calling this function "correct" in thread, multiple threads are running simultaneously.I am using Ubuntu OS.
You don't show where you allocate the actual arrays, you just show where you allocate the array of pointers. So it is quite possible that you have leaks elsewhere in the code you are not showing.
Furthermore, array_cleanup leaks since it only deletes those arrays you don't show where you allocate. It doesn't delete the array of pointers itself. The final row of that function should have been free(array);.
Your main problem is that you are using an obscure allocation algorithm. Instead, allocate true dynamic 2D arrays.
Answer based on digging for further information in comments.
Most malloc implementations usually don't return the memory to the operating system, but rather keep it for future calls to malloc. This is done because returning the memory to the operating system can impact performance quite a lot.
Furthermore, if you have certain allocation patterns, the memory that malloc keeps might not be easily reusable by future calls to malloc. This is called memory fragmentation and is a large topic of research for designing memory allocators.
Whatever htop/top/ps reports is not how much memory you have currently allocated inside your program with malloc, but all the various allocations that all libraries did, their reserves and such, which could be much more than you've allocated.
If you want an accurate assessment of how much memory you are leaking, you need to use a tool like valgrind or see if maybe the malloc you're using has diagnostic tools to help you with that.

How to read from buffer with feedback, so doesn't buffer overflow?

I have this code
#define BUFFER_LEN (2048)
static float buffer[BUFFER_LEN];
int readcount;
while ((readcount = sf_read_float(handle, buffer, BUFFER_LEN))) {
// alsa play
}
which reads BUFFER_LEN floats from buffer, and returns the number of floats it actually read. "handle" tells sf_rad_float how big buffer is.
E.g. if buffer contains 5 floats, and BUFFER_LEN is 3, readcount would first return 3, and next time 2, and the while-loop would exit.
I would like to have a function that does the same.
Update
After a lot of coding, I think this is the solution.
#include <stdio.h>
int copy_buffer(double* src, int src_length, int* src_pos,
float* dest, int dest_length) {
int copy_length = 0;
if (src_length - *src_pos > dest_length) {
copy_length = dest_length;
printf("copy_length1 %i\n", copy_length);
} else {
copy_length = src_length - *src_pos;
printf("copy_length2 %i\n", copy_length);
}
for (int i = 0; i < copy_length; i++) {
dest[i] = (float) src[*src_pos + i];
}
// remember where to continue next time the copy_buffer() is called
*src_pos += copy_length;
return copy_length;
}
int main() {
double src[] = {1,2,3,4,5};
int src_length = 5;
float dest[] = {0,0};
int dest_length = 2;
int read;
int src_pos = 0;
read = copy_buffer(src, src_length, &src_pos, dest, dest_length);
printf("read %i\n", read);
printf("src_pos %i\n", src_pos);
for (int i = 0; i < src_length; i++) {
printf("src %f\n", src[i]);
}
for (int i = 0; i < dest_length; i++) {
printf("dest %f\n", dest[i]);
}
return 0;
}
Next time copy_buffer() is called, dest contains 3,4. Running copy_buffer() again only copies the value "5". So I think it works now.
Although it is not very pretty, that I have int src_pos = 0; outside on copy_buffer().
It would be a lot better, if I instead could give copy_buffer() a unique handle instead of &src_pos, just like sndfile does.
Does anyone know how that could be done?
If you would like to create unique handles, you can do so with malloc() and a struct:
typedef intptr_t HANDLE_TYPE;
HANDLE_TYPE init_buffer_traverse(double * src, size_t src_len);
int copy_buffer(HANDLE_TYPE h_traverse, double * dest, size_t dest_len);
void close_handle_buffer_traverse(HANDLE_TYPE h);
typedef struct
{
double * source;
size_t source_length;
size_t position;
} TRAVERSAL;
#define INVALID_HANDLE 0
/*
* Returns a new traversal handle, or 0 (INVALID_HANDLE) on failure.
*
* Allocates memory to contain the traversal state.
* Resets traversal state to beginning of source buffer.
*/
HANDLE_TYPE init_buffer_traverse(double *src, size_t src_len)
{
TRAVERSAL * trav = malloc(sizeof(TRAVERSAL));
if (NULL == trav)
return INVALID_HANDLE;
trav->source = src;
trav->source_len = src_len;
trav->position = 0;
return (HANDLE_TYPE)trav;
}
/*
* Returns the system resources (memory) associated with the traversal handle.
*/
void close_handle_buffer_traverse(HANDLE_TYPE h)
{
TRAVERSAL * trav = NULL;
if (INVALID_HANDLE != h)
free((TRAVERSAL *)h);
}
int copy_buffer(HANDLE_TYPE h,
float* dest, int dest_length)
{
TRAVERSAL * trav = NULL;
if (INVALID_HANDLE == h)
return -1;
trav = (TRAVERSAL *)h;
int copy_length = trav->source_length - trav->position;
if (dest_length < copy_length)
copy_length = dest_length;
for (int i = 0; i*emphasized text* < copy_length; i++)
dest[i] = trav->source[trav->position + i];
// remember where to continue next time the copy_buffer() is called
trav->position += copy_length;
return copy_length;
}
This sort of style is what some C coders used before C++ came into being. The style involves a data structure, which contains all the data elements of our 'class'. Most API for the class takes as its first argument, a pointer to one of these structs. This pointer is similar to the this pointer. In our example this parameter was named trav.
The exception for the API would be those methods which allocate the handle type; these are similar to constructors and have the handle type as a return value. In our case named init_buffer_traverse might as well have been called construct_traversal_handle.
There are many other methods than this method for implementing an "opaque handle" value. In fact, some coders would manipulate the bits (via an XOR, for example) in order to obscure the true nature of the handles. (This obscurity does not provide security where such is needed.)
In the example given, I'm not sure (didn't look at sndlib) whether it would make most sense for the destination buffer pointer and length to be held in the handle structure or not. If so, that would make it a "copy buffer" handle rather than a "traversal" handle and you would want to change all the terminology from this answer.
These handles are only valid for the lifetime of the current process, so they are not appropriate for handles which must survive restarts of the handle server. For that, use an ISAM database and the column ID as handle. The database approach is much slower than the in-memory/pointer approach but for persistent handles, you can't use in-memory values, anyway.
On the other hand, it sounds like you are implementing a library which will be running within a single process lifetime. In which case, the answer I've written should be usable, after modifying to your requirements.
Addendum
You asked for some clarification of the similarity with C++ that I mention above. To be specific, some equivalent (to the above C code) C++ code might be:
class TRAVERSAL
{
double * source;
size_t source_length;
size_t position;
public TRAVERSAL(double *src, size_t src_len)
{
source = src;
source_length = src_len;
position = 0;
}
public int copy_buffer(double * dest, size_t dest_len)
{
int copy_length = source_length - position;
if (dest_length < copy_length)
copy_length = dest_length;
for (int i = 0; i < copy_length; i++)
dest[i] = source[position + i];
// remember where to continue next time the copy_buffer() is called
position += copy_length;
return copy_length;
}
}
There are some apparent differences. The C++ version is a little bit less verbose-seeming. Some of this is illusory; the equivalent of close_handle_buffer_traverse is now to delete the C++ object. Of course delete is not part of the class implementation of TRAVERSAL, delete comes with the language.
In the C++ version, there is no "opaque" handle.
The C version is more explicit and perhaps makes more apparent what operations are being performed by the hardware in response to the program execution.
The C version is more amenable to using the cast to HANDLE_TYPE in order to create an "opaque ID" rather than a pointer type. The C++ version could be "wrapped" in an API which accomplished the same thing while adding another layer. In the current example, users of this class will maintain a copy of a TRAVERSAL *, which is not quite "opaque."
In the function copy_buffer(), the C++ version need not mention the trav pointer because instead it implicitly dereferences the compiler-supplied this pointer.
sizeof(TRAVERSAL) should be the same for both the C and C++ examples -- with no vtable, also assuming run-time-type-identification for C++ is turned off, the C++ class contains only the same memory layout as the C struct in our first example.
It is less common to use the "opaque ID" style in C++, because the penalty for "transparency" is lowed in C++. The data members of class TRAVERSAL are private and so the TRAVERSAL * cannot be accidentally used to break our API contract with the API user.
Please note that both the opaque ID and the class pointer are vulnerable to abuse from a malicious API user -- either the opaque ID or class pointer could be cast directly to, e.g., double **, allowing the holder of the ID to change the source member directly via memory. Of course, you must trust the API caller already, because in this case the API calling code is in the same address space. In an example of a network file server, there could be security implications if "opaque ID" based on a memory address is exposed to the outside.
I would not normally make the digression into trustedness of the API user, but I want to clarify that the C++ keyword private has no "enforcement powers," it only specifies an agreement between programmers, which the compiler respects also unless told otherwise by the human.
Finally, the C++ class pointer can be converted to an opaque ID as follows:
typedef intptr_t HANDLE_TYPE;
HANDLE_TYPE init_buffer_traverse(double *src, size_t src_len)
{
return (HANDLE_TYPE)(new TRAVERSAL(src, src_len));
}
int copy_buffer(HANDLE_TYPE h_traverse, double * dest, size_t dest_len)
{
return ((TRAVERSAL *)h_traverse)->copy_buffer(dest, dest_len);
}
void close_handle_buffer_traverse(HANDLE_TYPE h)
{
delete ((TRAVERSAL *)h);
}
And now our brevity of "equivalent" C++ may be further questioned.
What I wrote about the old style of C programming which relates to C++ was not meant to say that C++ is better for this task. I only mean that encapsulation of data and hiding of implementation details could be done in C via a style that is almost isomorphic to a C++ style. This can be good to know if you find yourself programming in C but unfortunately having learned C++ first.
PS
I just noticed that our implementation to date had used:
dest[i] = (float)source[position + i];
when copying the bytes. Because both dest and source are double * (that is, they both point to double values), there is no need for a cast here. Also, casting from double to float may lose digits of precision in the floating-point representation. So this is best removed and restated as:
dest[i] = source[position + i];
I started to look at it, but you could probably do it just as well: libsndfile is open source, so one could look at how sf_read_float() works and create a function that does the same thing from a buffer. http://www.mega-nerd.com/libsndfile/ has a download link.

Resources