Copying very large strings in memory [closed]

Copying very large strings in memory [closed] - c

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm trying to implement a solution to copy a large string in memory in C.
Can you give me any advice about implementation or any reference?
I'm thinking to copy byte by byte since I don't know the length (probably I can't calculate it with strlen() since the string is very large).
Another concern is that I will have to reallocate memory on every step and I don't know how is the best way to do that. Is there any way that I can reallocate using only the reference to the last position of the memory already alocated and filled? Thus if the memory allocation fails, it will not affect the rest of the memory already filled.
What is the best value to return from this function? Should I return the number of bytes that were succesfully copied?
If there is a memory allocation fail, does realloc() set any global variable that I can check in the main function after I call the copying function? As I don't want to just return NULL from it if at some point realloc() fails, but I want to return a value more useful.

strlen() won't fail, as it uses size_t to descirbe the string's size, and size_t is large enough to hold the size of any object on the machine the program runs on.
So simply do
#define _XOPEN_SOURCE 500 /* for strdup */
#include <string.h>
int duplicate_string(const char * src, char ** pdst)
{
int result = 0;
if (NULL == ((*pdst) = strdup(src)))
{
result = -1;
}
return result;
}
If this fails try using an more clever structure to hold the data, for example by chopping it into slices:
#define _XOPEN_SOURCE 700 /* for strndup */
#include <string.h>
int slice_string(const char * src, char *** ppdst, size_t s)
{
int result = 0;
size_t s_internal = s + 1; /* Add one for the 0-terminator. */
size_t len = strlen(src) + 1;
size_t n =len/s_internal + (len%s_internal ?1 :0);
*ppdst = calloc(n + 1, sizeof(**ppdst)); /* +1 to have a stopper element. */
if (NULL == (*ppdst))
{
result = -1;
goto lblExit;
}
for (size_t i = 0; i < n; ++i)
{
(*ppdst)[i] = strndup(src, s);
if (NULL == (*ppdst)[i])
{
result = -1;
while (--i > 0)
{
free((*ppdst)[i]);
}
free(*ppdst);
*ppdst = NULL;
goto lblExit;
}
src += s;
}
lblExit:
return result;
}
Use such functions by trying dump copy first and if this fails by slicing the string.
int main(void)
{
char * s = NULL;
read_big_string(&s);
int result = 0;
char * d = NULL;
char ** pd = NULL;
/* 1st try dump copy. */
result = duplicate_string(s, &d);
if (0 != result)
{
/*2ndly try to slice it. */
{
size_t len = strlen(s);
do
{
len = len/2 + (len%2 ?1 :0);
result = slice_string(s, &pd, len);
} while ((0 != result) || (1 == len));
}
}
if (0 != result)
{
fprintf(stderr, "Duplicating the string failed.\n");
}
/* Use copies. */
if (NULL != d)
{
/* USe result from simple duplication. */
}
if (NULL != pd)
{
/* Use result from sliced duplication. */
}
/* Free the copies. */
if (NULL != pd)
{
for (size_t i = 0; pd[i]; ++i)
{
free(pd[i]);
}
}
free(pd);
free(d);
return 0;
}

realloc() failing
If there is a memory allocation fail, does realloc() set any global variable that I can check in the main function after I call the copying function? As I don't want to just return NULL from it if at some point realloc() fails, but I want to return a value more useful.
There's no problem with realloc() returning null if you use realloc() correctly. If you use realloc() incorrectly, you get what you deserve.
Incorrect use of realloc()
char *space = malloc(large_number);
space = realloc(space, even_larger_number);
If the realloc() fails, this code has overwritten the only reference to the previously allocated space with NULL, so not only have you failed to allocate new space but you also cannot release the old space because you've lost the pointer to it.
(For the fastidious: the fact that the original malloc() might have failed is not critical; space will be NULL, but that's a valid first argument to realloc(). The only difference is that there would be no previous allocation that was lost.)
Correct use of realloc()
char *space = malloc(large_number);
char *new_space = realloc(space, even_larger_number);
if (new_space != 0)
space = new_space;
This saves and tests the result of realloc() before overwriting the value in space.
Continually growing memory
Another concern is that I will have to reallocate memory on every step and I don't know how is the best way to do that. Is there any way that I can reallocate using only the reference to the last position of the memory already allocated and filled? Thus if the memory allocation fails, it will not affect the rest of the memory already filled.
The standard technique for avoiding quadratic behaviour (which really does matter when you're dealing with megabytes of data) is to double the space allocated for your working string when you need to grow it. You do that by keeping three values:
Pointer to the data.
Size of the data area that is allocated.
Size of the data area that is in use.
When the incoming data won't fit in the space that is unused, you reallocate the space, doubling the amount that is allocated unless you need more than that for the new space. If you think you're going to be adding more data later, then you might add double the new amount. This amortizes the cost of the memory allocations, and saves copying the unchanging data as often.
struct String
{
char *data;
size_t length;
size_t allocated;
};
int add_data_to_string(struct String *str, char const *data, size_t datalen)
{
if (str->length + datalen >= str->allocated)
{
size_t newlen = 2 * (str->allocated + datalen + 1);
char *newdata = realloc(str->data, newlen);
if (newdata == 0)
return -1;
str->data = newdata;
str->allocated = newlen;
}
memcpy(str->data + str->length, data, datalen + 1);
str->length += datalen;
return 0;
}
When you've finished adding to the string, you can release the unused space if you wish:
void release_unused(struct String *str)
{
char *data = realloc(str->data, str->length + 1);
str->data = data;
str->allocated = str->length + 1;
}
It is very unlikely that shrinking a memory block will move it, but the standard says:
The realloc function deallocates the old object pointed to by ptr and returns a
pointer to a new object that has the size specified by size. The contents of the new
object shall be the same as that of the old object prior to deallocation, up to the lesser of
the new and old sizes.
The realloc function returns a pointer to the new object (which may have the same
value as a pointer to the old object), or a null pointer if the new object could not be
allocated.
Note that 'may have the same value as a pointer to the old object' also means 'may have a different value from a pointer to the old object'.
The code assumes that it is dealing with null terminated strings; the memcpy() code copies the length plus one byte to collect the terminal null, for example, and the release_unused() code keeps a byte for the terminal null. The length element is the value that would be returned by strlen(), but it is crucial that you don't keep doing strlen() on megabytes of data. If you are dealing with binary data, you handle things subtly differently.

use a smart pointer and avoid copying in the first place

OK, let's use Cunningham's Question to help figure out what to do. Cunningham's Question (or Query - your choice :-) is:
What's the simplest thing that could possibly work?
-- Ward Cunningham
IMO the simplest thing that could possibly work would be to allocate a large buffer, suck the string into the buffer, reallocate the buffer down to the actual size of the string, and return a pointer to that buffer. It's the caller's responsibility to free the buffer they get when they're done with it. Something on the order of:
#define BIG_BUFFER_SIZE 100000000
char *read_big_string(FILE *f) /* read a big string from a file */
{
char *buf = malloc(BIG_BUFFER_SIZE);
fgets(buf, BIG_BUFFER_SIZE, f);
realloc(buf, strlen(buf)+1);
return buf;
}
This is example code only. There are #includes which are not included, and there's a fair number of possible errors which are not handled in the above, the implementation of which are left as an exercise for the reader. Your mileage may vary. Dealer contribution may affect cost. Check with your dealer for price and options available in your area. Caveat codor.
Share and enjoy.

Related

Allocating and freeing memory inside loop in C [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I have a question about how to make a temporary string in C. What I mean is I would like to create a string on every iteration step and free the variable after it is not useful anymore.
I have seen question similar to this one, but they differ in significant ways.
So right now I have something similar to this:
for (int i = 0; i < array_size; i++) {
//aray1 and array2 are arrays of strings
char* temporary_value = make_hash(array1[i], array2[i], size[i]);
if (is_valid(temporary_value)) {
//Code, that doesn't interferate in memory, but uses temporary_value - mostly just compare to it
}
free(temporary_value);
}
Where make_hash mallocs the memory depending on size[i].
But it feels so wrong and sometimes returns segment fault.
My ideas to improve this are:
Make string array and free it after the loop
Put "make_hash" code inside the for-loop and just realloc memory during iteration and free the temporary_value after the for-loop
But these solutions seem to be also bad. How would you approach this kind of problem?

When functions return objects of a known size, it is often better to let the caller handle allocations than the functions themselves, the caller often know what kind of allocation is best (automatic, static, heap, etc..). Just pass the pointer to where you want the result when calling the function.
Hash functions often returns hashes of fixed size, so i would go for this:
for (int i = 0; i < array_size; i++) {
char buffer[HASH_SIZE];
/*
`make_hash` writes result into `buffer` and returns `buffer` on success, or
`NULL` on error
*/
char* temporary_value = make_hash(buffer, array1[i], array2[i], size[i]);
if (is_valid(temporary_value)) {
//Code, that doesn't interferate in memory, but uses temporary_value - mostly just compare to it
}
}
In case your hash function does not return a fixed-size hash value, and you want the possibility to realloc your buffer, then pass a pointer to a pointer to your buffer, together to a pointer to a variable holding the buffer size:
make_hash(char **buffer, size_t *buffer_size, const char *str1, const char *str2, size_t s)
{
size_t new_size = .....;
if (new_size > *buffer_size)
{
char *tmp = realloc(*buffer, new_size);
if (!tmp)
return NULL;
*buffer = tmp;
*buffer_size = new_size;
}
/*
Calculate hash, and store it wherever `b` is pointing
*/
char *b = *buffer;
.......
return b; /* or `NULL` on error */
}
char *buffer = NULL;
size_t buffer_size = 0;
for (int i = 0; i < array_size; i++) {
char* temporary_value = make_hash(&buffer, &buffer_size, array1[i], array2[i], size[i]);
if (is_valid(temporary_value)) {
//Code, that doesn't interferate in memory, but uses temporary_value - mostly just compare to it
}
}
free(buffer);
If you have a cheap way of calculating the hash size, without calling make_hash() you could also go for the first solution together with a variable-length array:
for (int i = 0; i < array_size; i++) {
size_t buffer_size = hash_size(.....);
char buffer[buffer_size];
/*
`make_hash` writes result into `buffer` and returns `buffer` on success, or
`NULL` on error
*/
char* temporary_value = make_hash(buffer, array1[i], array2[i], size[i]);
if (is_valid(temporary_value)) {
//Code, that doesn't interferate in memory, but uses temporary_value - mostly just compare to it
}
}

How I would approach this is not freeing within the loop. malloc/free are usually very expensive system calls which you do not want to do if you know that you will have to call malloc again.
The correct way to do this would be malloc once, and then realloc on subsequent calls using the same block of memory, and then free once you're outside the loop.

The segmentation fault might come from a number of factors, are the arrays the same length, array1, array2 and size. you are also freeing memory without checking if it was allocated.
I will rather have something like this.
if (temporary_value != NULL) {
free(temporary_value);
}
It has been long since coding in C but that should help in troubleshooting.

How to concat byte arrays in C

My current concat function:
char* concat(char* a, int a_size,
char* b, int b_size) {
char* c = malloc(a_size + b_size);
memcpy(c, a, a_size);
memcpy(c + a_size, b, b_size);
free(a);
free(b);
return c;
}
But this used extra memory. Is it possible to append two byte arrays using realloc without making extra memory space?
Like:
void append(char* a, int a_size, char* b, int b_size)
...
char* a = malloc(2);
char* b = malloc(2);
void append(a, 2, b, 2);
//The size of a will be 4.

While Jean-François Fabre answered the stated question, I'd like to point out that you can manage such byte arrays better by using a structure:
typedef struct {
size_t max; /* Number of chars allocated for */
size_t len; /* Number of chars in use */
unsigned char *data;
} bytearray;
#define BYTEARRAY_INIT { 0, 0, NULL }
void bytearray_init(bytearray *barray)
{
barray->max = 0;
barray->len = 0;
barray->data = NULL;
}
void bytearray_free(bytearray *barray)
{
free(barray->data);
barray->max = 0;
barray->len = 0;
barray->data = NULL;
}
To declare an empty byte array, you can use either bytearray myba = BYTEARRAY_INIT; or bytearray myba; bytearray_init(&myba);. The two are equivalent.
When you no longer need the array, call bytearray_free(&myba);. Note that free(NULL) is safe and does nothing, so it is perfectly safe to free a bytearray that you have initialized, but not used.
To append to a bytearray:
int bytearray_append(bytearray *barray, const void *from, const size_t size)
{
if (barray->len + size > barray->max) {
const size_t len = barray->len + size;
size_t max;
void *data;
/* Example policy: */
if (len < 8)
max = 8; /* At least 8 chars, */
else
if (len < 4194304)
max = (3*len) / 2; /* grow by 50% up to 4,194,304 bytes, */
else
max = (len | 2097151) + 2097153 - 24; /* then pad to next multiple of 2,097,152 sans 24 bytes. */
data = realloc(barray->data, max);
if (!data) {
/* Not enough memory available. Old data is still valid. */
return -1;
}
barray->max = max;
barray->data = data;
}
/* Copy appended data; we know there is room now. */
memmove(barray->data + barray->len, from, size);
barray->len += size;
return 0;
}
Since this function can at least theoretically fail to reallocate memory, it will return 0 if successful, and nonzero if it cannot reallocate enough memory.
There is no need for a malloc() call, because realloc(NULL, size) is exactly equivalent to malloc(size).
The "growth policy" is a very debatable issue. You can just make max = barray->len + size, and be done with it. However, dynamic memory management functions are relatively slow, so in practice, we don't want to call realloc() for every small little addition.
The above policy tries to do something better, but not too aggressive: it always allocates at least 8 characters, even if less is needed. Up to 4,194,304 characters, it allocates 50% extra. Above that, it rounds the allocation size to the next multiple of 2,097,152 and substracts 24. The reasoning behid this is complex, but it is more for illustration and understanding than anything else; it is definitely NOT "this is best, and this is what you should do too". This policy ensures that each byte array allocates at most 4,194,304 = 222 unused characters. However, 2,097,152 = 221 is the size of a huge page on AMD64 (x86-64), and is a power-of-two multiple of a native page size on basically all architectures. It is also large enough to switch from so-called sbrk() allocation to memory mapping on basically all architectures that do that. It means that such huge allocations use a separate part of the heap for each, and the unused part is usually just virtual memory, not necessarily backed by any RAM, until accessed. As a result, this policy tends to work quite well for both very short byte arrays, and very long byte arrays, on most architectures.
Of course, if you know (or measure!) the typical size of the byte arrays in typical workloads, you can optimize the growth policy for that, and get even better results.
Finally, it uses memmove() instead of memcpy(), just in case someone wishes to repeat a part of the same byte array: memcpy() only works if the source and target areas do not overlap; memmove() works even in that case.
When using more advanced data structures, like hash tables, a variant of the above structure is often useful. (That is, this is much better in cases where you have lots of empty byte arrays.)
Instead of having a pointer to the data, the data is part of the structure itself, as a C99 flexible array member:
typedef struct {
size_t max;
size_t len;
unsigned char data[];
} bytearray;
You cannot declare a byte array itself (i.e. bytearray myba; will not work); you always declare a pointer to a such byte arrays: bytearray *myba = NULL;. The pointer being NULL is just treated the same as an empty byte array.
In particular, to see how many data items such an array has, you use an accessor function (also defined in the same header file as the data structure), rather than myba.len:
static inline size_t bytearray_len(bytearray *const barray)
{
return (barray) ? barray->len : 0;
}
static inline size_t bytearray_max(bytearray *const barray)
{
return (barray) ? barray->max : 0;
}
The (expression) ? (if-true) : (if-false) is a ternary operator. In this case, the first function is exactly equivalent to
static inline size_t bytearray_len(bytearray *const barray)
{
if (barray)
return barray->len;
else
return 0;
}
If you wonder about the bytearray *const barray, remember that pointer declarations are read from right to left, with * as "a pointer to". So, it just means that barray is constant, a pointer to a byte array. That is, we may change the data it points to, but we won't change the pointer itself. Compilers can usually detect such stuff themselves, but it may help; the main point is however to remind us human programmers that the pointer itself is not to be changed. (Such changes would only be visible within the function itself.)
Since such arrays often need to be resized, the resizing is often put into a separate helper function:
bytearray *bytearray_resize(bytearray *const barray, const size_t len)
{
bytearray *temp;
if (!len) {
free(barray);
errno = 0;
return NULL;
}
if (!barray) {
temp = malloc(sizeof (bytearray) + len * sizeof barray->data[0]);
if (!temp) {
errno = ENOMEM;
return NULL;
}
temp->max = len;
temp->len = 0;
return temp;
}
if (barray->len > len)
barray->len = len;
if (barray->max == len)
return barray;
temp = realloc(barray, sizeof (bytearray) + len * sizeof barray->data[0]);
if (!temp) {
free(barray);
errno = ENOMEM;
return NULL;
}
temp->max = len;
return temp;
}
What does that errno = 0 do in there? The idea is that because resizing/reallocating a byte array may change the pointer, we return the new one. If the allocation fails, we return NULL with errno == ENOMEM, just like malloc()/realloc() do. However, since the desired new length was zero, this saves memory by freeing the old byte array if any, and returns NULL. But since that is not an error, we set errno to zero, so that it is easier for callers to check if an error occurred or not. (If the function returns NULL, check errno. If errno is nonzero, an error occurred; you can use strerror(errno) to get a descriptive error message.)
You probably also noted the sizeof barray->data[0], used even when barray is NULL. This is okay, because sizeof is not a function, but an operator: it does not access the right side at all, it only evaluates to the size of the thing the right side refers to. (You only need to use parentheses when the right size is a type.) This form is nice, because it lets a programmer change the type of the data member, without changing any other code.
To append data to such a byte array, we probably want to be able to specify whether we anticipate further appends to the same array, or whether this is probably the final append, so that only the exact needed amount of memory is needed. For simplicity, I'll only implement the exact size version here. Note that this function returns a pointer to the (modified) byte array:
bytearray *bytearray_append(bytearray *barray,
const void *from, const size_t size,
int exact)
{
size_t len = bytearray_len(barray) + size;
if (exact) {
barray = bytearray_resize(barray, len);
if (!barray)
return NULL; /* errno already set by bytearray_resize(). */
} else
if (bytearray_max(barray) < len) {
if (!exact) {
/* Apply growth policy */
if (len < 8)
len = 8;
else
if (len < 4194304)
len = (3 * len) / 2;
else
len = (len | 2097151) + 2097153 - 24;
}
barray = bytearray_resize(barray, len);
if (!barray)
return NULL; /* errno already set by the bytearray_resize() call */
}
if (size) {
memmove(barray->data + barray->len, from, size);
barray->len += size;
}
return barray;
}
This time, we declared bytearray *barray, because we change where barray points to in the function. If the fourth parameter, final, is nonzero, then the resulting byte array is exactly the size needed; otherwise the growth policy is applied.

yes, since realloc will preserve the start of your buffer if the new size is bigger:
char* concat(char* a, size_t a_size,
char* b, size_t b_size) {
char* c = realloc(a, a_size + b_size);
memcpy(c + a_size, b, b_size); // dest is after "a" data, source is b with b_size
free(b);
return c;
}
c may be different from a (if the original memory block cannot be resized in-place contiguously to the new size by the system) but if that's the case, the location pointed by a will be freed (you must not free it), and the original data will be "moved".
My advice is to warn the users of your function that the input buffers must be allocated using malloc, else it will crash badly.

Function that reads an array until 0 is entered

I want to make a program that dynamically allocates memory for each element of an array while it is entered from stdin and stored into an array. The reading should stop when 0 is entered. If I try to make it directly in main(), in looks like this:
int *a;
int i = 0;
a = malloc(sizeof(int));
do
{
scanf("%d", &a[i]);
a = realloc(a, (i + 2) * sizeof(int)); // enough space for storing another number
i++;
} while (a[i-1] != 0);
But I don't know how to make a function that does this. This is what I've tried, but it crashes everytime.
void read(int **a, int *cnt)
{
a = malloc(sizeof(int));
*a = malloc(sizeof(int));
*cnt = 0;
do
{
scanf("%d", a[*cnt]);
*a = realloc(*a, (*cnt + 2) * sizeof(int)); // enough space for storing another number
(*cnt)++;
} while (a[*cnt-1] != 0);
}

how about putting everything inside a function and returning a;
int *read()
{
int *a;
int i = 0;
a = malloc(sizeof(int));
if( !a ) return NULL;
do
{
scanf("%d", &a[i]);
a = realloc(a, (i + 2) * sizeof(int)); // enough space for storing another number
if( !a ) return NULL;
i++;
} while (a[i-1] != 0);
return a;
}

Assuming you are calling this in the usual way:
void read(int **a, int *cnt)
{
a = malloc(sizeof(int)); // This overwrites local a disconnecting it from the main a
*a = malloc(sizeof(int)); // so this will only change the memory pointed by local a and leak memory
...
}
int main()
{
int *a;
int cnt = 0;
read(&a, &cnt);
...
}
What is happening you’re giving the address to the pointer a to the function and then in the function you’re immediately overwriting it with the memory allocation. Matter this the a in the function and a in the main are completely separate entities. If you then allocate memory for *a you’re only storing that in the local a and the main a will remain pointing to whatever it happened to be. So it is uninitialized and causes undefined behavior.
So remove the line a = malloc(sizeof(int)) and your code will properly affect the main a also.
You also have to use *a for everything in read, including scanf and while. So it might be better to make the function handle allocation and return a pointer as was suggested in another answer.
Also note you should check realloc for return values so you won’t leak memory or crash there and you should use sizeof(int*) when allocating a pointer, no matter if size of int and int* were the same. It looks clearer.

You can pattern your function along the POSIX getline() function.
The pattern is very simple. Your function receives a reference to the pointer (i.e., a pointer to a pointer) used for the data, resized dynamically; and a pointer to the size allocated to that pointer. It will return the number of elements read to the array.
If you were reading doubles rather than ints, and wished to read all doubles from the input until end-of-input (either end of file, if redirected from a file, or until the user types a non-number and presses Enter, or until the user presses Ctrl+D at the beginning of the line), the code would look something like this:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
size_t read_doubles(double **dataptr, size_t *sizeptr, FILE *in)
{
double *data; /* A local copy of the pointer */
size_t used = 0; /* Number of doubles in data */
size_t size; /* Number of doubles allocated for data */
/* Sanity checks against NULL pointers. */
if (!dataptr || !sizeptr || !in) {
errno = EINVAL; /* "Invalid parameter" */
return 0;
}
/* If *dataptr is NULL, or *sizeptr is zero,
there is no memory allocated yet. */
if (*dataptr != NULL && *sizeptr > 0) {
data = *dataptr;
size = *sizeptr;
} else {
data = NULL;
size = 0;
*dataptr = NULL;
*sizeptr = 0;
}
while (1) {
/* Ensure there is room in the data. */
if (used >= size) {
/* Need to allocate more.
Note: we have a copy of (data) in (*dataptr),
and of (size) in (*sizeptr). */
/* Reallocation policy. This one is simple,
reallocating in fixed-size chunks, but
better ones are well known: you probably
wish to ensure the size is at least some
sensible minimum (maybe a thousand or so),
then double the size up to a few million,
then increase the size in fixed-size chunks
of a few million, in a real-world application. */
size = used + 500;
/* Note: malloc(size) and realloc(NULL, size)
are equivalent; no need to check for NULL. */
data = realloc(data, size * sizeof data[0]);
if (!data) {
/* Reallocation failed, but the old data
pointer in (*dataptr) is still valid,
it isn't lost. Return an error. */
errno = ENOMEM;
return 0;
}
/* Reallocation succeeded; update the originals,
that are visible to the caller. */
*dataptr = data;
*sizeptr = size;
}
/* Read one more element, if possible.
Note: "&(data[used])" and "data+used"
are completely equivalent expressions.
*/
if (fscanf(input, " %lf", data + used) != 1)
break; /* No more input, or not a number. */
/* Yes, read a new data element. */
used++;
}
/* If we encountered a true read error,
return an error. */
if (ferror(input)) {
errno = EIO;
return 0;
}
/* Not an error; there just weren't more
data, or the data was not a number.
*/
/* Normally, programs do not set errno
except in case of an error. However,
here, used==0 just means there was no
data, it does not indicate an error per se.
For simplicity, because we know no error
has occurred, we just set errno=0 here,
rather than check if used==0 and only then
set errno to zero.
This also means it is safe to examine errno
after a call to this function, no matter what
the return value is. errno will be zero if no
errors have occurred, and nonzero in error cases.
*/
errno = 0;
return used;
}
The <errno.h> was included for the library to expose errno, and <string.h> for strerror(). These are both standard C.
However, the error constants I used above, EINVAL, ENOMEM, and EIO, are only defined by POSIXy systems, and might not exist in all systems. That is okay; you can just pick any smallish nonzero values and use them instead, because the function always sets errno. In that case, however, you need to check each of them and print the appropriate error message for each. In my case, all the systems I use define those three error codes for me, and I can just use strerror(errno) to convert the code to a standard error message (Invalid argument, Not enough space, and Input/output error, respectively, in non-localized programs).
Using a function defined like above, is very simple:
int main(void)
{
double *data = NULL; /* NULL for "Not allocated yet" */
size_t size = 0; /* 0 for "Not allocated yet" */
size_t used;
size_t i; /* Just a loop variable. */
used = read_doubles(&data, &size, stdin);
if (!used) {
/* No data read. Was it an actual error, or just no data? */
if (errno)
fprintf(stderr, "Error reading standard input: %s.\n", strerror(errno));
else
fprintf(stderr, "No numbers in standard input!\n");
return EXIT_FAILURE;
}
printf("Read %zu numbers from standard input.\n", used);
printf("(The dynamically allocated array has room for %zu.)\n", size);
for (i = 0; i < used; i++)
printf(" %f\n", data[i]);
/* Array no longer needed, so we can free it.
Explicitly NULLing and zeroing them means
we can safely reuse them later, if we were
to extend this program. So, it's not necessary
to clear them this way, but it is a good practice
considering it makes long-term maintenance easier. */
free(data);
data = NULL;
size = 0;
used = 0;
/* This version of the program has nothing more to do. */
return EXIT_SUCCESS;
}
Essentially, you just set the pointer you supply the address of to NULL, and the size you supply the address of also to 0, before the call to indicate no array has been dynamically allocated yet. There is no need to malloc() an initial array; realloc(NULL, size) is completely safe, and does exactly what malloc(size) does. Indeed, I often write code that has no malloc() anywhere in it, and use only realloc().
Note that the above code snippets are untested, so there might be typos in them. (And I did choose to use doubles instead of ints and a different end-of-input condition, to ensure you don't just copy-paste the code and use as-is, without reading and understanding it first.) If you find or suspect you found any, let me know in a comment, and I'll check.
Also note that the above code snippets are long only because I tried to write descriptive comments -- literally most of the "code" in them is comments. Writing descriptive comments -- those that describe the intent of the code, and not just what the code actually does; the latter is easy to read from the code itself, but the former is what you or others later reading the code need to know, to check if the code is sound or buggy --, is very hard, and even after over two decades, I'm still trying to get better at it.
If you like writing code, I do warmly recommend you start practicing writing good, intent-describing comments right away, rather than battle with it for decades like I have. It is surprising how much good comments, and occasionally a good nights sleep to review the code with fresh pair of eyes, helps.

Mysterious segfault though pointer is initialised

I am a newbie in C and I am trying to program a simple text editor, I have already written a 100 lines of stupid messy code, but it just worked. Until this SEGFAULT started showing up. I am going with the approach of switching terminal to canonical mode, and getting letter by letter from the user and do the necessary with each of 'em. The letters are added to a buffer, which is realloced extra 512 byte when the buffer is half filled, which I know is a stupid thing to do. But the cause of the SEGFAULT cant be determined. Help would be appreciated. Here's my code:
char* t_buf
int t_buf_len = 0;
int cur_buf_sz = 0;
void mem_mgr(char *buffer, unsigned long bytes){ //Function to allocate requested memory
if(buffer == NULL){
if((buffer = malloc(sizeof(char) * bytes)) == NULL){
printf("ERROR:Cannot get memory resources(ABORT)\n\r");
exit(1);
}
}
else{
realloc(buffer, bytes);
}
}
void getCharacter(){
if(t_buf_len >= (cur_buf_sz/2))
mem_mgr(t_buf, cur_buf_sz+=512);
strcpy(t_buf, "Yay! It works!");
printf("%s %d", t_buf, cur_buf_sz);
}

There are things you need to understand first,
The buffer pointer is a local variable inside the mem_mgr() function, it points to the same memory t_buf points initially, but once you modify it, it's no longer related to t_buf in any way.
So, when you return from mem_mgr() you lose the reference to the allocated memory and.
To fix this, you can pass a poitner to the pointer, and alter the actual pointer by dereferencing it.
The realloc() function, behaves exactly like malloc() if the first argument is NULL, if you read the documentation you would know that.
Memory allocation functions MUST be checked to ensure they returned a valid legal pointer, that's why you need a temporary poitner to store the return value of realloc(), because if it returns NULL, meaning that there was no memory to fulfill the request, you would lose reference to the original block of memory and you can't free it anymore.
You need to pass a pointer to your pointer to mem_mgr(), like this
int
mem_mgr(char **buffer, unsigned long bytes)
{
void *tmp = realloc(*buffer, bytes);
if (tmp != NULL) {
*buffer = tmp;
return 0;
}
return -1;
}
And then, to allocate memory
void
getCharacter()
{
if (t_buf_len >= (cur_buf_sz / 2)) {
if (mem_mgr(&t_buf, cur_buf_sz += 512) != -1) {
strcpy(t_buf, "Yay! It works!");
printf("%s %d", t_buf, cur_buf_sz);
}
}
}

The call to
mem_mgr(t_buf, cur_buf_sz+=512);
cannot change the actual parameter t_buf. You will either have to return the buffer from mem_mgr
t_buf = mem_mgr(t_buf, cur_buf_sz+=512);
or pass a pointer to t_buf
mem_mgr(&t_buf, cur_buf_sz+=512);
Furthermore, a call to realloc may change the address of the memory buffer, so you will have to use
char *tmpbuf = realloc(buffer, bytes);
if (!tmpbuf)
// Error handling
else
buffer = tmpbuf;
realloc(NULL, bytes); will behave like a malloc, so you don't need a separate branch here. This makes in total:
char *mem_mgr(char *buffer, unsigned long bytes){ //Function to allocate requested memory
char *tmpbuf = realloc(buffer, bytes);
if (!tmpbuf) {
// Error handling
}
return tmpbuf;
}
which somehow questions the reason of existence of the function mem_mgr.

C memory allocation question

So I have a couple of functions that work with a string type I have created. One of them creates a dynamically allocated sting. The other one takes said string, and extends it. And the last one frees the string. Note: The function names are changed, but all are custom-defined by me.
string new = make("Hello, ");
adds(new, "everyone");
free(new);
The code above works - it compiles and runs fine. The code below does not work - it compiles, runs, and then
string new = make("Hello, ");
adds(new, "everyone!");
free(new);
The difference between the code is that the adds() function is adding 1 more character (a !). The character it adds makes no difference - just the length. Just for completeness, the following code does not work:
string new = make("Hello, ");
adds(new, "everyone");
adds(new, "!");
free(new);
Oddly, the following code, which uses a different function, addc() (which adds 1 character instead of a string) works:
string new = make("Hello, ");
adds(new, "everyone");
addc(new, '!');
free(new);
The following, which also does the same thing, works:
string new = make("Hello, everyone!");
free(new);
The error that all the ones that don't work give is this:
test(526) malloc: *** error for object 0x100130: double free
*** set a breakpoint in malloc_error_break to debug
(test is the extremely descriptive name of the program I have this in.)
As far as the function internals, my make() is a call to strlen() and two calls to malloc() and a call to memcpy(), my adds() is a call to strlen(), a call to realloc(), and a call to memcpy(), and my free() is two calls to the standard library free().
So are there any ideas why I'm getting this, or do I need to break down and use a debugger? I'm only getting it with adds()es of over a certain length, and not with addc()s.
Breaking down and posting code for the functions:
typedef struct _str {
int _len;
char *_str;
} *string;
string make(char *c)
{
string s = malloc(sizeof(string));
if(s == NULL) return NULL;
s->_len = strlen(c);
s->_str = malloc(s->_len + 1);
if(s->_str == NULL)
{
free(s);
return NULL;
}
memcpy(s->_str, c, s->_len);
return s;
}
int adds(string s, char *c)
{
int l = strlen(c);
char *tmp;
if(l <= 0) return -1;
tmp = realloc(s->_str, s->_len + l + 1);
if(!tmp) return 0;
memcpy(s->_str + s->_len, c, l);
s->_len += l;
s->_str[s->_len] = 0;
return s->_len;
}
void myfree(string s)
{
if(s->_str) free(s->_str);
free(s);
s = NULL;
return;
}

A number of potential problems I would fix:
1/ Your make() is dangerous since it's not copying across the null-terminator for the string.
2/ It also makes little sense to set s to NULL in myfree() since it's a passed parameter and will have no effect on the actual parameter passed in.
3/ I'm not sure why you return -1 from adds() if the added string length is 0 or less. First, it can't be negative. Second, it seems quite plausible that you could add an empty string, which should result in not changing the string and returning the current string length. I would only return a length of -1 if it failed (i.e. realloc() didn't work) and make sure the old string is preserved if that happens.
4/ You're not storing the tmp variable into s->_str even though it can change - it rarely re-allocates memory in-place if you're increasing the size although it is possible if the increase is small enough to fit within any extra space allocated by malloc(). Reduction of size would almost certainly re-allocate in-place unless your implementation of malloc() uses different buffer pools for different-sized memory blocks. But that's just an aside, since you're not ever reducing the memory usage with this code.
5/ I think your specific problem here is that you're only allocating space for string which is a pointer to the structure, not the structure itself. This means when you put the string in, you're corrupting the memory arena.
This is the code I would have written (including more descriptive variable names, but that's just my preference).
I've changed:
the return values from adds() to better reflect the length and error conditions. Now it only returns -1 if it couldn't expand (and the original string is untouched) - any other return value is the new string length.
the return from myfree() if you want to really do want to set the string to NULL with something like "s = myfree (s)".
the checks in myfree() for NULL string since you can now never have an allocated string without an allocated string->strChars.
Here it is, use (or don't :-) as you see fit:
/*================================*/
/* Structure for storing strings. */
typedef struct _string {
int strLen; /* Length of string */
char *strChars; /* Pointer to null-terminated chars */
} *string;
/*=========================================*/
/* Make a string, based on a char pointer. */
string make (char *srcChars) {
/* Get the structure memory. */
string newStr = malloc (sizeof (struct _string));
if (newStr == NULL)
return NULL;
/* Get the character array memory based on length, free the
structure if this cannot be done. */
newStr->strLen = strlen (srcChars);
newStr->strChars = malloc (newStr->strLen + 1);
if(newStr->strChars == NULL) {
free(newStr);
return NULL;
}
/* Copy in string and return the address. */
strcpy (newStr->strChars, srcChars);
return newStr;
}
/*======================================================*/
/* Add a char pointer to the end of an existing string. */
int adds (string curStr, char *addChars) {
char *tmpChars;
/* If adding nothing, leave it alone and return current length. */
int addLen = strlen (addChars);
if (addLen == 0)
return curStr->strLen;
/* Allocate space for new string, return error if cannot be done,
but leave current string alone in that case. */
tmpChars = malloc (curStr->strLen + addLen + 1);
if (tmpChars == NULL)
return -1;
/* Copy in old string, append new string. */
strcpy (tmpChars, curStr->strChars);
strcat (tmpChars, addChars);
/* Free old string, use new string, adjust length. */
free (curStr->strChars);
curStr->strLen = strlen (tmpChars);
curStr->strChars = tmpChars;
/* Return new length. */
return curStr->strLen;
}
/*================*/
/* Free a string. */
string myfree (string curStr) {
/* Don't mess up if string is already NULL. */
if (curStr != NULL) {
/* Free chars and the string structure. */
free (curStr->strChars);
free (curStr);
}
/* Return NULL so user can store that in string, such as
<s = myfree (s);> */
return NULL;
}
The only other possible improvement I could see would be to maintain a buffer of space and the end of the strChars to allow a level of expansion without calling malloc().
That would require both a buffer length and a string length and changing the code to only allocate more space if the combined string length and new chars length is greater than the buffer length.
This would all be encapsulated in the function so the API wouldn't change at all. And, if you ever get around to providing functions to reduce the size of a string, they wouldn't have to re-allocate memory either, they'd just reduce their usage of the buffer. You'd probably need a compress() function in that case to reduce strings that have a large buffer and small string.

The first malloc in make should be:
malloc (sizeof (struct _str));
Otherwise you're only allocating enough space for a pointer to struct _str.

tmp = realloc(s->_str, s->_len + l + 1);
realloc can return a new pointer to the requested block. You need to add the following line of code:
s->_str = tmp;
The reason it doesn't crash in one case but does after adding one more character is probably just because of how memory is allocated. There's probably a minimum allocation delta (in this case of 16). So when you alloc the first 8 chars for the hello, it actually allocates 16. When you add the everyone it doesn't exceed 16 so you get the original block back. But for 17 chars, realloc returns a new memory buffer.
Try changing add as follows
tmp = realloc(s->_str, s->_len + l + 1);
if (!tmp) return 0;
if (tmp != s->_str) {
printf("Block moved!\n"); // for debugging
s->_str = tmp;
}

In function adds, you assume that realloc does not change the address of the memory block that needs to be reallocated:
tmp = realloc(s->_str, s->_len + l + 1);
if(!tmp) return 0;
memcpy(s->_str + s->_len, c, l);
While this may be true for small reallocations (because sizes of blocks of memory you get are usually rounded to optimize allocations), this is not true in general. When realloc returns you a new pointer, your program still uses the old one, causing the problem:
memcpy(s->_str + s->_len, c, l);

Probably should post the code, but the double free means you are calling free on the same pointer twice.
Are you adding 1 to strlen for the \0 byte at the end?
Once you free a pointer, are you setting your member variable to NULL so that you don't free again (or to a known bad pointer like 0xFFFFFFFF)

Why does "my free() is two calls to the standard library free()." Why are you calling free twice? You should only need to call once.
Please post your adds(); and free() functions.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Copying very large strings in memory [closed] - c

use a smart pointer and avoid copying in the first place

Related

Allocating and freeing memory inside loop in C [closed]

How to concat byte arrays in C

Function that reads an array until 0 is entered

Mysterious segfault though pointer is initialised

C memory allocation question

Categories

Resources