Alternative to heap allocated strings in C (with long lifetimes) - c

Is it possible to use "temporary string objects" in a C program?
For example, I have a reasonably large array of char * objects (part of a struct), which are currently using heap allocated memory. I have an opportunity to reduce my program's memory usage, since most of these names can be determined without using an explicit character array (although not all of them can).
In C++, I'd simply create an API that returns a std::string object (by value) and be done with it. In C, I can come up with no solutions that I'm thrilled about. Advice?
Here's some code, as requeted:
struct FOO {
char *name;
...
};
extern FOO* global_foo_array; /* in .h file */
void setup_foo(void) {
int i;
global_foo_array = (FOO*) malloc ( get_num_foo() * sizeof(FOO) );
for (i = 0; i < get_num_foo_with_complex_name(); ++i) {
global_foo_array.name[i] = malloc ( ... );
}
for (i = get_num_foo_with_complex_name(); i < get_num_foo(); ++i) {
char buf[100];
sprintf( buf, "foo #%d", i );
global_foo_array[i].name = strdup( buf );
}
}
get_num_foo() is approximately 100-1000x larger than get_num_foo_with_complex_name(). The program, for the most part, treats 'global_foo_array[].name' as read-only.
I see a non-negligible memory savings here and am wondering what's the best way to achieve that savings, balancing the human investment.

If you don't actually need all the strings, the obvious choice would be to create them on demand:
struct FOO {
char *name;
...
};
static FOO* global_foo_array; /* NOT in .h file */
void setup_foo(void) {
int i;
global_foo_array = (FOO*) malloc ( get_num_foo() * sizeof(FOO) );
memset(global_foo_array, 0, get_num_foo() * sizeof(FOO) );
}
FOO *get_foo(int i) {
if (i < 0 || i > get_num_foo())
return 0;
if (!global_foo_array[i].name) {
if (i < get_num_foo_with_complex_name()) {
global_foo_array.name[i] = malloc ( ... );
} else {
char buf[32];
sprintf( buf, "foo #%d", i );
global_foo_array[i].name = strdup( buf );
}
}
return &global_foo_array[i];
}
This still wastes space for all the FOO objects you don't need. If those are large, it might be better to have static FOO **global_foo_array (an extra level of indirection), and allocate those on demand too.

Related

Dynamically increasing C string's size

I'm currently creating a program that captures user's keypresses and stores them in a string. I wanted the string that stores the keypresses to be dynamic, but i came across a problem.
My current code looks something like this:
#include <stdio.h>
#include <stdlib.h>
typedef struct Foo {
const char* str;
int size;
} Foo;
int main(void)
{
int i;
Foo foo;
foo.str = NULL;
foo.size = 0;
for (;;) {
for (i = 8; i <= 190; i++) {
if (GetAsyncKeyState(i) == -32767) { // if key is pressed
foo.str = (char*)realloc(foo.str, (foo.size + 1) * sizeof(char)); // Access violation reading location xxx
sprintf(foo.str, "%s%c", foo.str, (char)i);
foo.size++;
}
}
}
return 0;
}
Any help would be appreciated, as I don't have any ideas anymore. :(
Should I maybe also allocate the Foo object dynamically?
First, in order to handle things nicely, you need to define
typedef struct Foo {
char* str;
int size
} Foo;
Otherwise, Foo is really annoying to mutate properly - you invoke undefined behaviour by modifying foo->str after the realloc call in any way.
The seg fault is actually caused by sprintf(foo.str, "%s%c", foo.str, (char)i);, not the call to realloc. foo.str is, in general, not null-terminated.
In fact, you're duplicating work by calling sprintf at all. realloc already copies all the characters previously in f.str, so all you have to do is add a single character via
f.str[size] = (char) i;
Edit to respond to comment:
If we wanted to append to strings (or rather, two Foos) together, we could do that as follows:
void appendFoos(Foo* const first, const Foo* const second) {
first->str = realloc(first->str, (first->size + second->size) * (sizeof(char)));
memcpy(first->str + first->size, second->str, second->size);
first->size += second->size;
}
The appendFoos function modifies first by appending second onto it.
Throughout this code, we leave Foos as non-null terminated. However, to convert to a string, you must add a final null character after reading all other characters.
const char *str - you declare the pointer to const char. You cant write to the referenced object as it invokes UB
You use sprintf just to add the char. It makes no sense.
You do not need a pointer in the structure.
You need to set compiler options to compile **as C language" not C++
I would do it a bit different way:
typedef struct Foo {
size_t size;
char str[1];
} Foo;
Foo *addCharToFoo(Foo *f, char ch);
{
if(f)
{
f = realloc(f, sizeof(*f) + f -> size);
}
else
{
f = realloc(f, sizeof(*f) + 1);
if(f) f-> size = 0
}
if(f) //check if realloc did not fail
{
f -> str[f -> size++] = ch;
f -> str[f -> size] = 0;
}
return f;
}
and in the main
int main(void)
{
int i;
Foo *foo = NULL, *tmp;
for (;;)
{
for (i = 8; i <= 190; i++)
{
if (GetAsyncKeyState(i) == -32767) { // if key is pressed
if((tmp = addCharToFoo(f, i))
{
foo = tmp;
}
else
/* do something - realloc failed*/
}
}
}
return 0;
}
sprintf(foo.str, "%s%c", foo.str, (char)i); is ill-formed: the first argument cannot be const char *. You should see a compiler error message.
After fixing this (make str be char *), then the behaviour is undefined because the source memory read by the %s overlaps with the destination.
Instead you would need to use some other method to append the character that doesn't involve overlapping read and writes (e.g. use the [ ] operator to write the character and don't forget about null termination).

Using an array of strings to implement a symbol table in C

I am trying to use an array of structs to create a symbol table. This is what I have so far, but I am having trouble allocating memory in the create function, is what I have so far correct?
I want something like this as my final result for arr
{ {"sym1"; 1}, {"sym2"; 2}, {"sym3"; 3} }
struct str_id {
char* s;
int id;
}
struct symbol_table {
int count;
struct str_id** arr;
}
struct symbol_table *symbol_table_create(void) {
struct symbol_table *stt = malloc(sizeof(struct symbol_table));
stt->count = 1;
stt->arr = malloc(sizeof(struct str_id*) * stt->count);
return stt;
}
Use descriptive names for identifiers, not cryptic short names (like s and str_id).
Avoid Systems Hungarian Notation (i.e. naming or prefixing identifiers after their type or what-they-are as opposed to what-they-mean).
In your case, I assume str_id is an abbreviation for struct_id (or string_id) - which is a bad name because it's already immediately obvious that it's a struct (or contains a string).
It was popular right until the 1990s when programmers started using more powerful editors and IDEs that kept track of variable types - it just isn't needed today.
*
Always check if a heap allocation succeeded or failed by comparing calloc and malloc's return values to NULL. This can be done with if( some_pointer ) abort().
Don't use assert( some_pointer ) because assertions are only enabled in debug builds, use abort instead as it signifies abnormal program termination compared to exit.
Pass a size_t parameter so consumers can specify the size of the symbol table.
Quantities of objects held in memory should be expressed as size_t (e.g. array indexers). Never use int for this!
You need to put a semi-colon at the end of each struct definition.
Are you sure you want an array-of-pointers-to-structs and not just an array-of-structs? In this case you can use inline structs and use a single allocation for the array, instead of allocating each member separately.
Because you're performing custom allocation, you must also define a destructor function.
struct symbol_table_entry {
char* symbolText;
int id;
};
struct symbol_table {
size_t count;
struct symbol_table_entry** entries;
};
struct symbol_table* create_symbol_table( size_t count ) {
struct symbol_table* stt = malloc( sizeof(struct symbol_table) );
if( !stt )
{
abort();
}
stt->count = count;
stt->entries = calloc( count, sizeof(struct symbol_table_entry) );
if( !stt->entries ) {
free( stt );
abort();
}
// Note that calloc will zero-initialize all entries of the array (which prevents debuggers showing garbage string contents) so we don't need to do it ourselves.
return stt;
}
void destroy_symbol_table( struct symbol_table* stt, bool free_strings ) {
if( stt->entries ) {
if( free_strings ) {
for( size_t i = 0; i < stt->count; i++ ) {
free( stt->entries[i]->symbolText );
}
}
free( stt->entries );
}
free( stt );
}

How to design a function which return a array of oid

As already written at issue#2217, I want to design a function which return a list of oid in the first out param.
Should I:
Return the list of oids as a pointer to pointer?
int git_commit_tree_last_commit_id(git_oid **out, git_repository *repo, const git_commit *commit, char *path)
Or return the list of oids as a pointer to a custom struct?
int git_commit_tree_last_commit_id(git_oid_xx_struct *out, git_repository *repo, const git_commit *commit, char *path)
What is your advice?
The question is, how do you know how many OIDs are in the returned array, and who allocates the underlying memory.
For the first part there are several possibilities,
Return the number in a separate return parameter,
Use a sentinel value to terminate the list.
Return a new struct type, like git_strarray that contains the count and the
raw data.
For the second part, either
the caller can allocate the underlying memory
The function can allocate the memory
the new struct type can manage the memory.
Which path you go down depends upon what you want the code to look like, how much you expect it to be reused, how critical performance is etc.
To start with I'd go with the simplest, which IMO is function returns count and allocates memory.
That means my function would have to look like this:
int get_some_oids_in_an_array(OID** array, int * count, ... ) {
...
*count = number_of_oids;
*array = (OID*)malloc( sizeof(OID)*number_of_oids);
for(i=0; i<number_of_oids; ++i) {
*array[i]=...;
}
...
return 0;
}
/* Example of usage */
void use_get_oids() {
OID* oids;
int n_oids;
int ok = get_some_oids_in_an_array(&oids, &n_oids, ...);
for(i=0; i<n_oids; ++i ) {
... use oids[i] ...
}
free(oids);
}
Note: I'm returning an array of OID, rather than an array of OID*, either is a valid option, and which will work best for you will vary.
If it turned out I was using this kind of pattern often, then would consider switching to the struct route.
int get_some_oids( oidarray * oids, ... ) {
int i;
oidarray_ensure_size(number_of_oids);
for(i=0; i<number_of_oids; ++i) {
oidarray_set_value(i, ...);
}
return 0;
}
typedef struct oidarray {
size_t count;
OID* oids;
};
/* Example of usage */
void use_get_oids() {
oid_array oids = {0};
get_some_oids(&oids);
for(i=0; i<oids.count; ++i) {
... use oids.oids[i] ...
}
oidarray_release(&oids);
}

Pointer Conventions with: Array of pointers to certain elements

This question is about the best practices to handle this pointer problem I've dug myself into.
I have an array of structures that is dynamically generated in a function that reads a csv.
int init_from_csv(instance **instances,char *path) {
... open file, get line count
*instances = (instance*) malloc( (size_t) sizeof(instance) * line_count );
... parse and set values of all instances
return count_of_valid_instances_read;
}
// in main()
instance *instances;
int ins_len = init_from_csv(&instances, "some/path/file.csv");
Now, I have to perform functions on this raw data, split it, and perform the same functions again on the splits. This data set can be fairly large so I do not want to duplicate the instances, I just want an array of pointers to structs that are in the split.
instance **split = (instance**) malloc (sizeof(instance*) * split_len_max);
int split_function(instance *instances, ins_len, instances **split){
int i, c;
c = 0;
for (i = 0; i < ins_len; i++) {
if (some_criteria_is_true) {
split[c++] = &instances[i];
}
return c;
}
Now my question what would be the best practice or most readable way to perform a function on both the array of structs and the array of pointers? For a simple example count_data().
int count_data (intances **ins, ins_len, float crit) {
int i,c;
c = 0;
for (i = 0; i < ins_len; i++) {
if ins[i]->data > crit) {
++c;
}
}
return c;
}
// code smell-o-vision going off by now
int c1 = count_data (split, ins_len, 0.05); // works
int c2 = count_data (&instances, ins_len, 0.05); // obviously seg faults
I could make my init_from_csv malloc an array of pointers to instances, and then malloc my array of instances. I want to learn how a seasoned c programmer would handle this sort of thing though before I start changing a bunch of code.
This might seem a bit grungey, but if you really want to pass that instances** pointer around and want it to work for both the main data set and the splits, you really need to make an array of pointers for the main data set too. Here's one way you could do it...
size_t i, mem_reqd;
instance **list_seg, *data_seg;
/* Allocate list and data segments in one large block */
mem_reqd = (sizeof(instance*) + sizeof(instance)) * line_count;
list_seg = (instance**) malloc( mem_reqd );
data_seg = (instance*) &list_seg[line_count];
/* Index into the data segment */
for( i = 0; i < line_count; i++ ) {
list_seg[i] = &data_seg[i];
}
*instances = list_seg;
Now you can always operate on an array of instance* pointers, whether it's your main list or a split. I know you didn't want to use extra memory, but if your instance struct is not trivially small, then allocating an extra pointer for each instance to prevent confusing code duplication is a good idea.
When you're done with your main instance list, you can do this:
void free_instances( instance** instances )
{
free( instances );
}
I would be tempted to implement this as a struct:
struct instance_list {
instance ** data;
size_t length;
int owner;
};
That way, you can return this from your functions in a nicer way:
instance_list* alloc_list( size_t length, int owner )
{
size_t i, mem_reqd;
instance_list *list;
instance *data_seg;
/* Allocate list and data segments in one large block */
mem_reqd = sizeof(instance_list) + sizeof(instance*) * length;
if( owner ) mem_reqd += sizeof(instance) * length;
list = (instance_list*) malloc( mem_reqd );
list->data = (instance**) &list[1];
list->length = length;
list->owner = owner;
/* Index the list */
if( owner ) {
data_seg = (instance*) &list->data[line_count];
for( i = 0; i < line_count; i++ ) {
list->data[i] = &data_seg[i];
}
}
return list;
}
void free_list( instance_list * list )
{
free(list);
}
void erase_list( instance_list * list )
{
if( list->owner ) return;
memset((void*)list->data, 0, sizeof(instance*) * list->length);
}
Now, your function that loads from CSV doesn't have to focus on the details of creating this monster, so it can simply do the task it's supposed to do. You can now return lists from other functions, whether they contain the data or simply point into other lists.
instance_list* load_from_csv( char *path )
{
/* get line count... */
instance_list *list = alloc_list( line_count, 1 );
/* parse csv ... */
return list;
}
etc... Well, you get the idea. No guarantees this code will compile or work, but it should be close. I think it's important, whenever you're doing something with arrays that's even slightly more complicated than just a simple array, it's useful to make that tiny extra effort to encapsulate it. This is the major data structure you'll be working with for your analysis or whatever, so it makes sense to give it a little bit of stature in that it has its own data type.
I dunno, was that overkill? =)

How to copy array of struct in C

I have defined struct like
typedef struct {
char *oidkey;
int showperf;
char oidrealvalue[BUFSIZE];
char *oidlimits;
} struct_oidpairs;
and I have array of struct
struct_oidpairs b[] ={{.....},....}
and I want to copy it to new struct array a[]
please help
Something like this:
memcpy(dest, src, sizeof(struct) * sizeof(src));
Your struct contains pointers as data members, this means you will have to roll out your own copy function that will do something sensible with the pointers. memcpy only works is all the data related to the struct is stored in the struct.
For real copy of the contents, follow Sjoerd's answer and then:
for (i = 0; i < sizeof(src); i++)
{
if (src[i].oidkey != NULL)
{
dest[i].oidkey = malloc(strlen(src[i].oidkey) + 1);
strcpy(dest[i].oidkey, src[i].oidkey);
}
if (src[i].oidlimits != NULL)
{
dest[i].oidlimits = malloc(strlen(src[i].oidlimits) + 1);
strcpy(dest[i].oidlimits, src[i].oidlimits);
}
}
You may consider memcpy if you are interested in speed.
Update:
Following harper's code, I
updated the code to check NULL
pointers
This is a quoted note from
gordongekko:
This solution will crash if
oidkey or oidlimits are != NULL and not
'\0'-terminated means not initialized
Makes NOT a deep copy:
struct_oidpairs b[] = {...};
size_t len = sizeof b/sizeof*b;
struct_oidpairs *a = malloc(sizeof b);
if( !a ) /* ERROR handling */
memcpy( a, b, sizeof b );
...
free( a );
or
while( len-- )
a[len] = b[len];
...
free( a );
maybe using :
for(int i=0; i<b.size() ;i++)
//I am not sure if this is the correct name for the size function
{
a[i]=b[i];
}

Resources