How to concat byte arrays in C - c

My current concat function:
char* concat(char* a, int a_size,
char* b, int b_size) {
char* c = malloc(a_size + b_size);
memcpy(c, a, a_size);
memcpy(c + a_size, b, b_size);
free(a);
free(b);
return c;
}
But this used extra memory. Is it possible to append two byte arrays using realloc without making extra memory space?
Like:
void append(char* a, int a_size, char* b, int b_size)
...
char* a = malloc(2);
char* b = malloc(2);
void append(a, 2, b, 2);
//The size of a will be 4.

While Jean-François Fabre answered the stated question, I'd like to point out that you can manage such byte arrays better by using a structure:
typedef struct {
size_t max; /* Number of chars allocated for */
size_t len; /* Number of chars in use */
unsigned char *data;
} bytearray;
#define BYTEARRAY_INIT { 0, 0, NULL }
void bytearray_init(bytearray *barray)
{
barray->max = 0;
barray->len = 0;
barray->data = NULL;
}
void bytearray_free(bytearray *barray)
{
free(barray->data);
barray->max = 0;
barray->len = 0;
barray->data = NULL;
}
To declare an empty byte array, you can use either bytearray myba = BYTEARRAY_INIT; or bytearray myba; bytearray_init(&myba);. The two are equivalent.
When you no longer need the array, call bytearray_free(&myba);. Note that free(NULL) is safe and does nothing, so it is perfectly safe to free a bytearray that you have initialized, but not used.
To append to a bytearray:
int bytearray_append(bytearray *barray, const void *from, const size_t size)
{
if (barray->len + size > barray->max) {
const size_t len = barray->len + size;
size_t max;
void *data;
/* Example policy: */
if (len < 8)
max = 8; /* At least 8 chars, */
else
if (len < 4194304)
max = (3*len) / 2; /* grow by 50% up to 4,194,304 bytes, */
else
max = (len | 2097151) + 2097153 - 24; /* then pad to next multiple of 2,097,152 sans 24 bytes. */
data = realloc(barray->data, max);
if (!data) {
/* Not enough memory available. Old data is still valid. */
return -1;
}
barray->max = max;
barray->data = data;
}
/* Copy appended data; we know there is room now. */
memmove(barray->data + barray->len, from, size);
barray->len += size;
return 0;
}
Since this function can at least theoretically fail to reallocate memory, it will return 0 if successful, and nonzero if it cannot reallocate enough memory.
There is no need for a malloc() call, because realloc(NULL, size) is exactly equivalent to malloc(size).
The "growth policy" is a very debatable issue. You can just make max = barray->len + size, and be done with it. However, dynamic memory management functions are relatively slow, so in practice, we don't want to call realloc() for every small little addition.
The above policy tries to do something better, but not too aggressive: it always allocates at least 8 characters, even if less is needed. Up to 4,194,304 characters, it allocates 50% extra. Above that, it rounds the allocation size to the next multiple of 2,097,152 and substracts 24. The reasoning behid this is complex, but it is more for illustration and understanding than anything else; it is definitely NOT "this is best, and this is what you should do too". This policy ensures that each byte array allocates at most 4,194,304 = 222 unused characters. However, 2,097,152 = 221 is the size of a huge page on AMD64 (x86-64), and is a power-of-two multiple of a native page size on basically all architectures. It is also large enough to switch from so-called sbrk() allocation to memory mapping on basically all architectures that do that. It means that such huge allocations use a separate part of the heap for each, and the unused part is usually just virtual memory, not necessarily backed by any RAM, until accessed. As a result, this policy tends to work quite well for both very short byte arrays, and very long byte arrays, on most architectures.
Of course, if you know (or measure!) the typical size of the byte arrays in typical workloads, you can optimize the growth policy for that, and get even better results.
Finally, it uses memmove() instead of memcpy(), just in case someone wishes to repeat a part of the same byte array: memcpy() only works if the source and target areas do not overlap; memmove() works even in that case.
When using more advanced data structures, like hash tables, a variant of the above structure is often useful. (That is, this is much better in cases where you have lots of empty byte arrays.)
Instead of having a pointer to the data, the data is part of the structure itself, as a C99 flexible array member:
typedef struct {
size_t max;
size_t len;
unsigned char data[];
} bytearray;
You cannot declare a byte array itself (i.e. bytearray myba; will not work); you always declare a pointer to a such byte arrays: bytearray *myba = NULL;. The pointer being NULL is just treated the same as an empty byte array.
In particular, to see how many data items such an array has, you use an accessor function (also defined in the same header file as the data structure), rather than myba.len:
static inline size_t bytearray_len(bytearray *const barray)
{
return (barray) ? barray->len : 0;
}
static inline size_t bytearray_max(bytearray *const barray)
{
return (barray) ? barray->max : 0;
}
The (expression) ? (if-true) : (if-false) is a ternary operator. In this case, the first function is exactly equivalent to
static inline size_t bytearray_len(bytearray *const barray)
{
if (barray)
return barray->len;
else
return 0;
}
If you wonder about the bytearray *const barray, remember that pointer declarations are read from right to left, with * as "a pointer to". So, it just means that barray is constant, a pointer to a byte array. That is, we may change the data it points to, but we won't change the pointer itself. Compilers can usually detect such stuff themselves, but it may help; the main point is however to remind us human programmers that the pointer itself is not to be changed. (Such changes would only be visible within the function itself.)
Since such arrays often need to be resized, the resizing is often put into a separate helper function:
bytearray *bytearray_resize(bytearray *const barray, const size_t len)
{
bytearray *temp;
if (!len) {
free(barray);
errno = 0;
return NULL;
}
if (!barray) {
temp = malloc(sizeof (bytearray) + len * sizeof barray->data[0]);
if (!temp) {
errno = ENOMEM;
return NULL;
}
temp->max = len;
temp->len = 0;
return temp;
}
if (barray->len > len)
barray->len = len;
if (barray->max == len)
return barray;
temp = realloc(barray, sizeof (bytearray) + len * sizeof barray->data[0]);
if (!temp) {
free(barray);
errno = ENOMEM;
return NULL;
}
temp->max = len;
return temp;
}
What does that errno = 0 do in there? The idea is that because resizing/reallocating a byte array may change the pointer, we return the new one. If the allocation fails, we return NULL with errno == ENOMEM, just like malloc()/realloc() do. However, since the desired new length was zero, this saves memory by freeing the old byte array if any, and returns NULL. But since that is not an error, we set errno to zero, so that it is easier for callers to check if an error occurred or not. (If the function returns NULL, check errno. If errno is nonzero, an error occurred; you can use strerror(errno) to get a descriptive error message.)
You probably also noted the sizeof barray->data[0], used even when barray is NULL. This is okay, because sizeof is not a function, but an operator: it does not access the right side at all, it only evaluates to the size of the thing the right side refers to. (You only need to use parentheses when the right size is a type.) This form is nice, because it lets a programmer change the type of the data member, without changing any other code.
To append data to such a byte array, we probably want to be able to specify whether we anticipate further appends to the same array, or whether this is probably the final append, so that only the exact needed amount of memory is needed. For simplicity, I'll only implement the exact size version here. Note that this function returns a pointer to the (modified) byte array:
bytearray *bytearray_append(bytearray *barray,
const void *from, const size_t size,
int exact)
{
size_t len = bytearray_len(barray) + size;
if (exact) {
barray = bytearray_resize(barray, len);
if (!barray)
return NULL; /* errno already set by bytearray_resize(). */
} else
if (bytearray_max(barray) < len) {
if (!exact) {
/* Apply growth policy */
if (len < 8)
len = 8;
else
if (len < 4194304)
len = (3 * len) / 2;
else
len = (len | 2097151) + 2097153 - 24;
}
barray = bytearray_resize(barray, len);
if (!barray)
return NULL; /* errno already set by the bytearray_resize() call */
}
if (size) {
memmove(barray->data + barray->len, from, size);
barray->len += size;
}
return barray;
}
This time, we declared bytearray *barray, because we change where barray points to in the function. If the fourth parameter, final, is nonzero, then the resulting byte array is exactly the size needed; otherwise the growth policy is applied.

yes, since realloc will preserve the start of your buffer if the new size is bigger:
char* concat(char* a, size_t a_size,
char* b, size_t b_size) {
char* c = realloc(a, a_size + b_size);
memcpy(c + a_size, b, b_size); // dest is after "a" data, source is b with b_size
free(b);
return c;
}
c may be different from a (if the original memory block cannot be resized in-place contiguously to the new size by the system) but if that's the case, the location pointed by a will be freed (you must not free it), and the original data will be "moved".
My advice is to warn the users of your function that the input buffers must be allocated using malloc, else it will crash badly.

Related

Pointer error when assigning data to a typedef struct in C

I am working on a series of C functions to allow a user to dynamically build an array. The core of the library resides in the Array struct which contains a pointer variable array that contains the array data, len which contains the length of the array, size, which is the total memory allocation for the array, elem, which contains the memory allocation per indices, and pointer variables name and dtype which contains strings describing the name of the array and the type of the array. For the moment I have constrained the scope so that only int, float, double, and char arrays can be considered.
Thus far I have defined, and individually tested the following functions;
array_mem_alloc which contains code that allocates memory for an array.
init_array which is a wrapper around array_mem_alloc that instantiates an Array struct, determines the data type and returns an Array data type to a user.
append_array which allows a user to dynamically grow an array one index at a time, or add an already defined array.
free_array which frees all memory and resets struct variables
int_array_val which typecasts the data at an index and returns to user. I have versions of this function for all relevant data types, but for this problem I will only use this version.
find_int_array_indices which looks for where a specific integer exists in the array and records the index number into another array which is returned to the user.
For the purposes of testing find_int_array_indices I am calling init_array for a variable titled arr_test and appending it with 7 integers int a[7] = {6, 1, 3, 6, 6, 4, 5}. I pass the Array container arr_test to the find_int_array_indices function and everything works fine, which also returns another Array container titled p. However, when I try to retrieve the integer variables with the int_array_val function it fails, because it does not recognize the variable array->dtype as containing the string "int". However, when I test the container inside of find_int_array_indices and in the main function, the variable does contain the string "int". This tells me that I probably have a pointer error, but I do not see it. Any advice would be very useful. I am wondering if I need to go back to the beginning and define name and dtype as fixed length arrays in the Array struct instead of as pointer variables.
array.h
typedef struct
{
void *array; // Pointer to array
size_t len; // Active length of array
size_t size; // Number of allocated indizes
int elem; // Memory consumption per indice
char *name; // The array name
char *dtype; // A string representing the datatype
} Array;
void array_mem_alloc(Array *array, size_t num_indices);
Array init_array(char *dtype, size_t num_indices, char *name);
int append_array(Array *array, void *elements, size_t count);
void free_array(Array *array);
int int_array_val(Array *array, int indice);
Array find_int_array_indices(Array *array, int integer);
array.c
void array_mem_alloc(Array *array, size_t num_indices) {
// Determine the total memory allocation and assign to pointer
void *pointer;
pointer = malloc(num_indices * array->elem);
// If memory is full fail gracefully
if (pointer == NULL) {
printf("Unable to allocate memory, exiting.\n");
free(pointer);
exit(0);
}
// Allocate resources and instantiate Array
else {
array->array = pointer;
array->len = 0;
array->size = num_indices;
}
}
// --------------------------------------------------------------------------------
Array init_array(char *dtype, size_t num_indices, char *name) {
// Determine memory blocks based on data type
int size;
if (strcmp(dtype, "float") == 0) size = sizeof(float);
else if (strcmp(dtype, "int") == 0) size = sizeof(int);
else if (strcmp(dtype, "double") == 0) size = sizeof(double);
else if (strcmp(dtype, "char") == 0) size = sizeof(char);
else {
printf("Data type not correctly entered into init_array, exiting program!\n");
exit(0);
}
// Allocate indice size and call array_mem_alloc
Array array;
array.dtype = dtype;
array.elem = size;
array_mem_alloc(&array, num_indices);
array.name = name;
return array;
}
// --------------------------------------------------------------------------------
int append_array(Array *array, void *elements, size_t count) {
// Allocae more memory if necessary
if (array->len + count > array->size) {
size_t size = (array->len + count) * 2;
void *pointer = realloc(array->array, size * array->elem);
// If memory is full return operations
if (pointer == NULL) {
printf("Unable to allocate memory, exiting.\n");
return 0;
}
// Allocate memory to variables and increment array size
array->array = pointer;
array->size = size;
}
// Append variables and increment the array length
memcpy((char *)array->array + array->len * array->elem, elements, count * array->elem);
array->len += count;
return 1;
}
// --------------------------------------------------------------------------------
void free_array(Array *array) {
// Free all memory in the array
free(array->array);
// Reset all variables in the struct
array->array = NULL;
array->size = 0;
array->len = 0;
array->elem = 0;
}
// --------------------------------------------------------------------------------
int int_array_val(Array *array, int indice) {
// Ensure array contains integers
printf("%s\n", array->dtype);
if (strcmp(array->dtype, "int") != 0) {
printf("Function can only return integer values, exiting function!\n");
exit(0);
}
// Cast value to an integer and return
int a = ((int *)array->array)[indice];
return a;
}
Array find_int_array_indices(Array *array, int integer) {
int number = 0;
int input;
for (int i = 0; i < array->len; i++) {
if (integer == int_array_val(array, i)) {
number++;
}
}
char dtype[7] = "int";
char name[9] = "indices";
Array indice_arr = init_array(dtype, number, name);
for (int i = 0; i < array->len; i++) {
input = i;
if (integer == int_array_val(array, i)) {
append_array(&indice_arr, &input, 1);
}
}
return indice_arr;
}
main.c
size_t indices = 10;
char name[6] = "array";
char dtype[7] = "int";
Array arr_test = init_array(dtype, indices, name);
int a[7] = {6, 1, 3, 6, 6, 4, 5};
append_array(&arr_test, a, 7);
Array p = find_int_array_indices(&arr_test, 6);
printf("%s\n", p.dtype); // This shows that p does contain dtype "int"
int d = int_array_val(&p, 0); // This fails in function, because it does not see dtype = "int"???
printf("%d\n", d);
In find_int_array_indices
char dtype[7] = "int";
char name[9] = "indices";
are both local variables, which are invalidated when the function returns. See: Dangling pointer and Lifetime.
init_array uses these values as if they had a lifetime to match its return value
Array array;
array.dtype = dtype;
array.elem = size;
array_mem_alloc(&array, num_indices);
array.name = name;
return array;
which, as a structure type, is a lifetime determined by the context of its caller (return is copy, after all).
find_int_array_indices completes the error when it returns indice_arr to main.
Some options:
Strictly use pointers to strings with static storage duration.
Change your structure definition to include space for these strings (or allocate it), and perform string copies.
Use an enumerated type instead.
Ditch this string-based, type limited paradigm all together by supporting all memory sizes generically (the naming feature remains an issue, though).
A rather long-winded continuation, to elaborate on using enumerated types:
The idea is to define a smaller set of acceptable values that your library works with, and making the user more aware of these values. As we can see, you have partially done that using strings but the implementation has some issues, as strings are generally clunky. Some problems with strings:
you have no control over the strings that users of your library use (this leads you to have to exit1 the program in the event the users enters something unexpected, which is easy to do),
you must account for their potentially large or excess memory consumption,
string comparison is O(N),
strings are generally unsafe in C, requiring more care than other basic constructs when handling them (assignment, comparison, storage).
So instead of using strings ("foo", "bar", "qux" in these examples), we use an enumerated type
enum OBJECT_TYPE {
OBJECT_FOO,
OBJECT_BAR,
OBJECT_QUX
};
which establishes the following:
it is more clear what the acceptable values are
some2 control over what users enter, via type hinting
comparison is O(1)
handling is the same as any integral type
The structure definition then looks like
typedef struct {
/* ... whatever members are needed for the structure */
size_t something_based_on_type;
enum OBJECT_TYPE type;
char debug_name[MAX_DEBUG_NAME];
} Object;
Nothing can really be done about the name member of your structure. If you want user defined nametags for things, then yes, as stated previously, you need to allocate space for them.
Our initialization function works similarly, but we can2 take advantage of some properties of integral types.
void object_init(Object *object, enum OBJECT_TYPE type, const char *debug_name) {
/* ... accept other arguments, whatever is needed to initialize */
size_t value_translations[] = { 42, 51, 99 };
object->type = type;
/* while neat, this is somewhat naive, see footnotes */
object->something_based_on_type = value_translations[type];
if (debug_name && strlen(debug_name) < MAX_DEBUG_NAME)
strcpy(object->debug_name, debug_name);
else
*object->debug_name = '\0';
}
Now we want to provide a function that works with our generic data of only type OBJECT_FOO (like your int_array_val). Again, the comparison is much easier to understand.
void object_print_foo(Object *o) {
if (OBJECT_FOO != o->type)
/* handle type mismatch */;
}
Although it would be better to provide a generic object_print function that again branches based on o->type.
A main function for completeness:
int main(void) {
Object a;
object_init(&a, OBJECT_QUX, "object_a");
object_print_foo(&a);
}
This is the general idea of using enumerated types.
With all that said, I think this is not really any better than just handling arbitrary data sizes, risks included. Something like
const void *array_get(Array *array, size_t index) {
if (index >= array->length)
return NULL;
return (char *) array->array + index * array->elem;
}
works, if the user respects the const contract, and uses the correct types (they would need to remember their typing with specifically typed getters too).
Generic data structures in C are a bit of a leap of faith no matter what.
1. So a note on exiting from library code: don't. As a library author, you have no reasonable right to cause user programs to terminate (unless requested, or the user invokes UB outside your control). Delegate upwards, return errors, and let the user exit the program on their own terms, as they may need to perform their own cleanups (or might carry on if the failure is non-critical).
2. C's enumeration type is rather weak. enum are actually just int, and users can enter plain integer values outside the specified ranges. This is akin to invoking undefined behavior from a library's point of view, but we may wish to protect the user anyway.

Convert pointer to 2D char array in C

This probably has been asked already, but I'm unable to find anything on it.
I have a string array, where the numbers of strings in it is determined at runtime (the max string length is known, if that helps). Since I need global access to that array, I used a pointer and malloc'ed enough space to it when I actually know how much has to fit in there:
char *global_strings;
void some_func(int strings_nr, int strings_size)
{
global_strings = (char*) malloc(strings_nr* strings_size* sizeof(char));
}
What would be the correct way in C to use this pointer like a two-dimensional char array equivalent to
global_strings[strings_nr][strings_size] ?
As a global pointer to 2D data, whose N*M characteristics defined at run-time, I'd recommend a helper function to access the strings rather than directly use it. Make it inline or as a macro if desired.
char *global_strings = NULL;
size_t global_strings_nr = 0;
size_t global_strings_size = 0;
// Allocation -
// OK to call again, but prior data may not be organized well with a new string_size
// More code needed to handle that.
void some_func(int strings_nr, int strings_size) {
global_strings_nr = strings_nr; // save for later use
global_strings_size = strings_size; // save for later use
global_strings = realloc(global_strings,
sizeof *global_strings * strings_nr * strings_size);
if (global_strings == NULL) {
global_strings_nr = global_strings_size = 0;
}
}
// Access function
char *global_strings_get(size_t index) {
if (index >= global_strings_nr) {
return NULL;
}
return global_strings + index*global_strings_size;
}
#define GLOBAL_STRINGS_GET_WO_CHECK(index) \
(global_strings + (index)*global_strings_size)
Better to use size_t for array indexing and sizing than int.
Casts not needed.
Memory calculations should begin with a size_t rather than int * int * size_t.

realloc of array inside a struct

I'm trying to write a function that uses realloc() to extend the array as pointed to within in instance of a struct, however I can't seem to get it to work.
The relevant part of my code is:
struct data_t {
int data_size;
uint16_t *data;
};
void extend_data(data_t container, uint16_t value) {
// adds an additional uint16_t to the array of DATA, updates its internal
// variables, and initialises the new uint to VALUE.
int len_data = sizeof(*(container->data)) / sizeof(uint16_t);
printf("LENGTH OF DATA: %d\n", len_data);
container->data = realloc(container->data, sizeof(*(container->data))+sizeof(uint16_t));
container->data_size++;
container->data[container->data_size-1] = value;
len_data = sizeof(*(container->data)) / sizeof(uint16_t);
printf("LENGTH OF DATA: %d\n", len_data);
printf("data_size: %d\n", container->data_size);
return;
}
Can anybody see what the problem is with this?
Edit
As R. Sahu points out, container is not a pointer in this function - when you said the code "wasn't working", I assumed you meant that you weren't growing your array, but what you've written here won't even compile.
Are you sure you've copied this code correctly? If so, does "not working" mean you're getting a compile-time error, a run-time error, or just unexpected output?
If you've copied the code as written, then the first thing you need to do is change the function prototype to
void extend_data(data_t *container, uint16_t value) {
and make sure you're passing a pointer to your data_t type, otherwise the update won't be reflected in calling code.
Original
In the line
container->data = realloc(container->data, sizeof(*(container->data))+sizeof(uint16_t));
sizeof(*(container->data)) evaluates to sizeof (uint16_t). container->data is a pointer to, not an array of, uint16_t; sizeof will give you the size of the pointer object, not the number of elements you've allocated. What you want to do is something like the following:
/**
* Don't assign the result of a realloc call back to the original
* pointer - if the call fails, realloc will return NULL and you'll
* lose the reference to your original buffer. Assign the result to
* a temporary, then after making sure the temporary is not NULL,
* assign that back to your original pointer.
*/
uint16_t *tmp = realloc(container-data, sizeof *container->data * (container->data_size + 1) );
if ( tmp )
{
/**
* Only add to container->data and update the value of container->data_size
* if the realloc call succeeded.
*/
container->data = tmp;
container->data[container->data_size++] = value;
}
You don't calculate the new size correctly. Consider this:
typedef struct {
size_t size;
int *data;
} int_array;
#define INT_ARRAY_INIT { 0, NULL}
void int_array_resize(int_array *const array,
const size_t newsize)
{
if (!array) {
fprintf(stderr, "int_array_resize(): NULL int_array.\n");
exit(EXIT_FAILURE);
}
if (!newsize) {
free(array->data);
array->data = 0;
array->size = 0;
} else
if (newsize != array->size) {
void *temp;
temp = realloc(array->data, newsize * sizeof array->data[0]);
if (!temp) {
fprintf(stderr, "int_array_resize(): Out of memory.\n");
exit(EXIT_FAILURE);
}
array->data = temp;
array->size = newsize;
}
}
/* int_array my_array = INT_ARRAY_INIT;
is equivalent to
int_array my_array;
int_array_init(&my_array);
*/
void int_array_init(int_array *const array)
{
if (array) {
array->size = 0;
array->data = NULL;
}
}
void int_array_free(int_array *const array)
{
if (array) {
free(array->data);
array->size = 0;
array->data = NULL;
}
}
The key point is newsize * sizeof array->data[0]. This is the number of chars needed for newsize elements of whatever type array->data[0] has. Both malloc() and realloc() take the size in chars.
If you initialize new structures of that type using int_array my_array = INT_ARRAY_INIT; you can just call int_array_resize() to resize it. (realloc(NULL, size) is equivalent to malloc(size); free(NULL) is safe and does nothing.)
The int_array_init() and int_array_free() are just helper functions to initialize and free such arrays.
Personally, whenever I have dynamically resized arrays, I keep both the allocated size (size) and the size used (used):
typedef struct {
size_t size; /* Number of elements allocated for */
size_t used; /* Number of elements used */
int *data;
} int_array;
#define INT_ARRAY_INIT { 0, 0, NULL }
A function that ensures there are at least need elements that can be added is then particularly useful. To avoid unnecessary reallocations, the function implements a policy that calculates the new size to allocate for, as a balance between amount of memory "wasted" (allocated but not used) and number of potentially slow realloc() calls:
void int_array_need(int_array *const array,
const size_t need)
{
size_t size;
void *data;
if (!array) {
fprintf(stderr, "int_array_need(): NULL int_array.\n");
exit(EXIT_FAILURE);
}
/* Large enough already? */
if (array->size >= array->used + need)
return;
/* Start with the minimum size. */
size = array->used + need;
/* Apply growth/reallocation policy. This is mine. */
if (size < 256)
size = (size | 15) + 1;
else
if (size < 2097152)
size = (3 * size) / 2;
else
size = (size | 1048575) + 1048577 - 8;
/* TODO: Verify (size * sizeof array->data[0]) does not overflow. */
data = realloc(array->data, size * sizeof array->data[0]);
if (!data) {
/* Fallback: Try minimum allocation. */
size = array->used + need;
data = realloc(array->data, size * sizeof array->data[0]);
}
if (!data) {
fprintf(stderr, "int_array_need(): Out of memory.\n");
exit(EXIT_FAILURE);
}
array->data = data;
array->size = size;
}
There are many opinions on what kind of reallocation policy you should use, but it really depends on the use case.
There are three things in the balance: number of realloc() calls, as they might be "slow"; memory fragmentation if different arrays are grown requiring many realloc() calls; and amount of memory allocated but not used.
My policy above tries to do many things at once. For small allocations (up to 256 elements), it rounds the size up to the next multiple of 16. That is my attempt at a good balance between memory used for small arrays, and not very many realloc() calls.
For larger allocations, 50% is added to the size. This reduces the number of realloc() calls, while keeping the allocated but unused/unneeded memory below 50%.
For really large allocations, when you have 221 elements or more, the size is rounded up to the next multiple of 220, less a few elements. This caps the number of allocated but unused elements to about 221, or two million elements.
(Why less a few elements? Because it does not harm on any systems, and on certain systems it may help a lot. Some systems, including x86-64 (64-bit Intel/AMD) on certain operating systems and configurations, support large ("huge") pages that can be more efficient in some ways than normal pages. If they are used to satisfy an allocation, I want to avoid the case where an extra large page is allocated just to cater for the few bytes the C library needs internally for the allocation metadata.)
It appears you aren't using sizeof correctly. In your struct you've defined a uint16_t pointer, not an array. The size of the uint16_t* data type is the size of a pointer on your system. You need to store the size of the allocated memory along with the pointer if you want to be able to accurately resize it. It appears you already have a field for this with data_size. Your example might be able to be fixed as,
// I was unsure of the typedef-ing happening with data_t so I made it more explicit in this example
typedef struct {
int data_size;
uint16_t* data;
} data_t;
void extend_data(data_t* container, uint16_t value) {
// adds an additional uint16_t to the array of DATA, updates its internal
// variables, and initialises the new uint to VALUE.
// CURRENT LENGTH OF DATA
int len_data = container->data_size * sizeof(uint16_t);
printf("LENGTH OF DATA: %d\n", len_data);
uint16_t* tmp = realloc(container->data, (container->data_size + 1) * sizeof(uint16_t));
if (tmp) {
// realloc could fail and return false.
// If this is not handled it could overwrite the pointer in `container` and cause a memory leak
container->data = tmp;
container->data_size++;
container->data[container->data_size-1] = value;
} else {
// Handle allocation failure
}
len_data = container->data_size * sizeof(uint16_t);
printf("LENGTH OF DATA: %d\n", len_data);
printf("data_size: %d\n", container->data_size);
return;
}
void extend_data(data_t container, ...
In your function container is not the pointer but the struct itself passed by the value so you cant use the -> operator.
The realloced memory will be lost as you work on the local copy of the passed strucure and it will be lost on the function return.
sizeof(*(container.data)) / sizeof(uint16_t)
it will be always 1 as the *(uint16_t *) / sizeof(uint16_t) is always one.
Why: data member is pointer to the uint16_t. *data has the type of uint16_t
sizeof is calculated during the compilation not the runtime and it does not return the ammount of memory allocated by the malloc.

What is the opposite of calloc in C

It is more than a funny question. :-)
I wish to initialize an array in C, but instead of zeroing out the array with calloc. I want to set all element to one. Is there a single function that does just that?
I have used my question above to search in google, no answer. Hope you can help me out! FYI, I am first year CS student just starting to program in C.
There isn't a standard C memory allocation function that allows you to specify a value other than 0 that the allocated memory is initialized to.
You could easily enough write a cover function to do the job:
void *set_alloc(size_t nbytes, char value)
{
void *space = malloc(nbytes);
if (space != 0)
memset(space, value, nbytes);
return space;
}
Note that this assumes you want to set each byte to the same value. If you have a more complex initialization requirement, you'll need a more complex function. For example:
void *set_alloc2(size_t nelems, size_t elemsize, void *initializer)
{
void *space = malloc(nelems * elemsize);
if (space != 0)
{
for (size_t i = 0; i < nelems; i++)
memmove((char *)space + i * elemsize, initializer, elemsize);
}
return space;
}
Example usage:
struct Anonymous
{
double d;
int i;
short s;
char t[2];
};
struct Anonymous a = { 3.14159, 23, -19, "A" };
struct Anonymous *b = set_alloc2(20, sizeof(struct Anonymous), &a);
memset is there for you:
memset(array, value, length);
There is no such function. You can implement it yourself with a combination of malloc() and either memset() (for character data) or a for loop (for other integer data).
The impetus for the calloc() function's existence (vs. malloc() + memset()) is that it can be a nice performance optimization in some cases. If you're allocating a lot of data, the OS might be able to give you a range of virtual addresses that are already initialized to zero, which saves you the extra cost of manually writing out 0's into that memory range. This can be a large performance gain because you don't need to page all of those pages in until you actually use them.
Under the hood, calloc() might look something like this:
void *calloc(size_t count, size_t size)
{
// Error checking omitted for expository purposes
size_t total_size = count * size;
if (total_size < SOME_THRESHOLD) // e.g. the OS's page size (typically 4 KB)
{
// For small allocations, allocate from normal malloc pool
void *mem = malloc(total_size);
memset(mem, 0, total_size);
return mem;
}
else
{
// For large allocations, allocate directory from the OS, already zeroed (!)
return mmap(NULL, total_size, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE, -1, 0);
// Or on Windows, use VirtualAlloc()
}
}

Copying very large strings in memory [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm trying to implement a solution to copy a large string in memory in C.
Can you give me any advice about implementation or any reference?
I'm thinking to copy byte by byte since I don't know the length (probably I can't calculate it with strlen() since the string is very large).
Another concern is that I will have to reallocate memory on every step and I don't know how is the best way to do that. Is there any way that I can reallocate using only the reference to the last position of the memory already alocated and filled? Thus if the memory allocation fails, it will not affect the rest of the memory already filled.
What is the best value to return from this function? Should I return the number of bytes that were succesfully copied?
If there is a memory allocation fail, does realloc() set any global variable that I can check in the main function after I call the copying function? As I don't want to just return NULL from it if at some point realloc() fails, but I want to return a value more useful.
strlen() won't fail, as it uses size_t to descirbe the string's size, and size_t is large enough to hold the size of any object on the machine the program runs on.
So simply do
#define _XOPEN_SOURCE 500 /* for strdup */
#include <string.h>
int duplicate_string(const char * src, char ** pdst)
{
int result = 0;
if (NULL == ((*pdst) = strdup(src)))
{
result = -1;
}
return result;
}
If this fails try using an more clever structure to hold the data, for example by chopping it into slices:
#define _XOPEN_SOURCE 700 /* for strndup */
#include <string.h>
int slice_string(const char * src, char *** ppdst, size_t s)
{
int result = 0;
size_t s_internal = s + 1; /* Add one for the 0-terminator. */
size_t len = strlen(src) + 1;
size_t n =len/s_internal + (len%s_internal ?1 :0);
*ppdst = calloc(n + 1, sizeof(**ppdst)); /* +1 to have a stopper element. */
if (NULL == (*ppdst))
{
result = -1;
goto lblExit;
}
for (size_t i = 0; i < n; ++i)
{
(*ppdst)[i] = strndup(src, s);
if (NULL == (*ppdst)[i])
{
result = -1;
while (--i > 0)
{
free((*ppdst)[i]);
}
free(*ppdst);
*ppdst = NULL;
goto lblExit;
}
src += s;
}
lblExit:
return result;
}
Use such functions by trying dump copy first and if this fails by slicing the string.
int main(void)
{
char * s = NULL;
read_big_string(&s);
int result = 0;
char * d = NULL;
char ** pd = NULL;
/* 1st try dump copy. */
result = duplicate_string(s, &d);
if (0 != result)
{
/*2ndly try to slice it. */
{
size_t len = strlen(s);
do
{
len = len/2 + (len%2 ?1 :0);
result = slice_string(s, &pd, len);
} while ((0 != result) || (1 == len));
}
}
if (0 != result)
{
fprintf(stderr, "Duplicating the string failed.\n");
}
/* Use copies. */
if (NULL != d)
{
/* USe result from simple duplication. */
}
if (NULL != pd)
{
/* Use result from sliced duplication. */
}
/* Free the copies. */
if (NULL != pd)
{
for (size_t i = 0; pd[i]; ++i)
{
free(pd[i]);
}
}
free(pd);
free(d);
return 0;
}
realloc() failing
If there is a memory allocation fail, does realloc() set any global variable that I can check in the main function after I call the copying function? As I don't want to just return NULL from it if at some point realloc() fails, but I want to return a value more useful.
There's no problem with realloc() returning null if you use realloc() correctly. If you use realloc() incorrectly, you get what you deserve.
Incorrect use of realloc()
char *space = malloc(large_number);
space = realloc(space, even_larger_number);
If the realloc() fails, this code has overwritten the only reference to the previously allocated space with NULL, so not only have you failed to allocate new space but you also cannot release the old space because you've lost the pointer to it.
(For the fastidious: the fact that the original malloc() might have failed is not critical; space will be NULL, but that's a valid first argument to realloc(). The only difference is that there would be no previous allocation that was lost.)
Correct use of realloc()
char *space = malloc(large_number);
char *new_space = realloc(space, even_larger_number);
if (new_space != 0)
space = new_space;
This saves and tests the result of realloc() before overwriting the value in space.
Continually growing memory
Another concern is that I will have to reallocate memory on every step and I don't know how is the best way to do that. Is there any way that I can reallocate using only the reference to the last position of the memory already allocated and filled? Thus if the memory allocation fails, it will not affect the rest of the memory already filled.
The standard technique for avoiding quadratic behaviour (which really does matter when you're dealing with megabytes of data) is to double the space allocated for your working string when you need to grow it. You do that by keeping three values:
Pointer to the data.
Size of the data area that is allocated.
Size of the data area that is in use.
When the incoming data won't fit in the space that is unused, you reallocate the space, doubling the amount that is allocated unless you need more than that for the new space. If you think you're going to be adding more data later, then you might add double the new amount. This amortizes the cost of the memory allocations, and saves copying the unchanging data as often.
struct String
{
char *data;
size_t length;
size_t allocated;
};
int add_data_to_string(struct String *str, char const *data, size_t datalen)
{
if (str->length + datalen >= str->allocated)
{
size_t newlen = 2 * (str->allocated + datalen + 1);
char *newdata = realloc(str->data, newlen);
if (newdata == 0)
return -1;
str->data = newdata;
str->allocated = newlen;
}
memcpy(str->data + str->length, data, datalen + 1);
str->length += datalen;
return 0;
}
When you've finished adding to the string, you can release the unused space if you wish:
void release_unused(struct String *str)
{
char *data = realloc(str->data, str->length + 1);
str->data = data;
str->allocated = str->length + 1;
}
It is very unlikely that shrinking a memory block will move it, but the standard says:
The realloc function deallocates the old object pointed to by ptr and returns a
pointer to a new object that has the size specified by size. The contents of the new
object shall be the same as that of the old object prior to deallocation, up to the lesser of
the new and old sizes.
The realloc function returns a pointer to the new object (which may have the same
value as a pointer to the old object), or a null pointer if the new object could not be
allocated.
Note that 'may have the same value as a pointer to the old object' also means 'may have a different value from a pointer to the old object'.
The code assumes that it is dealing with null terminated strings; the memcpy() code copies the length plus one byte to collect the terminal null, for example, and the release_unused() code keeps a byte for the terminal null. The length element is the value that would be returned by strlen(), but it is crucial that you don't keep doing strlen() on megabytes of data. If you are dealing with binary data, you handle things subtly differently.
use a smart pointer and avoid copying in the first place
OK, let's use Cunningham's Question to help figure out what to do. Cunningham's Question (or Query - your choice :-) is:
What's the simplest thing that could possibly work?
-- Ward Cunningham
IMO the simplest thing that could possibly work would be to allocate a large buffer, suck the string into the buffer, reallocate the buffer down to the actual size of the string, and return a pointer to that buffer. It's the caller's responsibility to free the buffer they get when they're done with it. Something on the order of:
#define BIG_BUFFER_SIZE 100000000
char *read_big_string(FILE *f) /* read a big string from a file */
{
char *buf = malloc(BIG_BUFFER_SIZE);
fgets(buf, BIG_BUFFER_SIZE, f);
realloc(buf, strlen(buf)+1);
return buf;
}
This is example code only. There are #includes which are not included, and there's a fair number of possible errors which are not handled in the above, the implementation of which are left as an exercise for the reader. Your mileage may vary. Dealer contribution may affect cost. Check with your dealer for price and options available in your area. Caveat codor.
Share and enjoy.

Resources