Efficiently "Wrapping" Constant Strings in C - c

I'm currently writing a parser in C, and one of the things that I needed when designing it was a mutable string "class" (a set of functions that operate on opaque structs representing instances), which I called my_string. Instances of the string class are little more than structs that wrap a char *, along with some metadata.
A problem arises, though, with constant strings. For example, I have several methods that return my_string * pointers, but sometimes I want to return a constant string. Consider this contrived pseudo-code:
my_string *element_get_data(my_element *el)
{
if (element_has_character_data(el))
return element_get_character_data(el); /* returns a (my_string *) */
else
return my_string_constant("#data"); /* ditto */
}
… where in some cases I want to fetch a pre-built my_string instance, but in others I want to just return the string "#data" wrapped in a my_string struct.
The problem with this code is that it creates a new (heap-allocated) my_string instance every time element_get_data(...) is called. C constant strings have nice semantics in that they're statically allocated in the program's DATA section, so every time a constant string is encountered, the address of that string is always the same.
It therefore seems silly to have several different my_string instances all pointing to the exact same char *. What's an efficient way to deduplicate this? Should I keep a hash table of const char * -> my_string * mappings? Or is there a way to use similar semantics to C constant strings? On the Mac, Core Foundation manages to do this with the CFSTR(...) macro.
The ideal solution to me would be to somehow craft a macro like my_string_constant(...) that stores a my_string struct in the DATA section of the program, so it too can be constant. Is such a thing possible?

While I was writing up this question (or rather, almost immediately after I finished), I found the answer to my question in the form of GNUStep's implementation of Core Foundation's CFSTR() macro. My similar implementation looks like this:
#define MY_STR(str) ({\
static struct { const char *buffer; my_bool shouldFree; my_bool mutable; my_bool constant; } s = {NULL, MY_FALSE, MY_FALSE, MY_TRUE};\
s.buffer = str;\
(my_string *)&s;\
})
The reason this works is because the block of code gets inlined during compile time, which means it creates a statically-allocated struct relative to the local scope. Thus, if (e.g.) a function containing MY_STR("Hello, world!") is called multiple times, the same statically-allocated struct will always be returned, resulting in our desired behavior.
This concept could easily be extended beyond things like strings, allowing you to easily create your own statically-allocated object types. Neat!

Related

Initialize flexible array C struct member from Go

I am trying to initialize a struct of C array in go side.
I am new to cgo. Still trying to understand the use case.
test.h
typedef struct reply {
char *name;
reply_cb callback_fn;
} reply_t;
typedef struct common {
char *name;
int count;
reply_t reply[];
} common_t;
int
init_s (common_t *service);
test.go
name := C.CString("ABCD")
defer C.free(unsafe.Pointer(name))
num := C.int(3)
r := [3]C.reply_t{{C.CString("AB"), (C.s_cb)(unsafe.Pointer(C.g_cb))},
{C.CString("BC"), (C.s_cb)(unsafe.Pointer(C.g_cb))},
{C.CString("CD"), (C.s_cb)(unsafe.Pointer(C.g_cb))}}
g := C.common_t{
name: name,
count: num,
reply : r,
}
rc := C.init_s(&g)
I am getting error on "reply: r" unknown field 'r' in struct literal of type
Any help will be appreciated. The goal is initialize and then use it values in C init_s for processing.
You cannot use a flexible array field from Go: https://go-review.googlesource.com/c/go/+/12864/.
I think the reasonong is simple: this wart of C normally requires you to perform a trick of allocating a properly-aligned memory buffer long enough to accomodate for the sizeof(struct_type) itself at the beginning of that buffer plus sizeof(array_member[0]) * array_element_count bytes. This does not map to Go's type system because in it, structs have fixed size known at compile time. If Go would not hide reply from the definition, it would refer to a zero-length field you cannot do anything useful with anyway—see #20275.
Don't be deceived by code examples where a flexible array member field is initialized with a literal: as torek pointed out, it's a GCC extension, but what is more important, it requires work on part of the compiler—that is, it analyzes the literal, understands the context it appeared in and generates a code which allocates large enough memory block to accomodate both the struct and all the members of the flexible array.
The initialization of the array in your Go code may look superficially similar but it has an important difference: it allocates a separate array which has nothing to do with the memory block of the struct it's supposed to be "put into".
What's more Go's array are different beasts than C's: in C, arrays are pointers in disguise, in Go, arrays are first-class citizens and when you assign an array or pass it to a function call, the whole array is copied by value—as opposed to "decaying into a pointer"—in C's terms.
So even if the Go compiler would not hide the reply field, assignment to it would fail.
I think you cannot directly use values of this type from Go without additional helper code written in C. For instance, to initialize values of common_t, you would write a C helper which would first allocate a memory buffer long enough and then expose to the Go code a pair of pointers: to the beginning of the buffer (of type *C.common_t), and to the first element of the array—as *C.reply_t.
If this C code is the code you own, I'd recommend to just get rid of the flexible array and maintain a pointer to a "normal" array in the reply field.
Yes, this would mean extra pointer chasing for the CPU but it will be simpler to interoperate with Go.

Copying one structure to another

I know that I can copy the structure member by member, instead of that can I do a memcpy on structures?
Is it advisable to do so?
In my structure, I have a string also as member which I have to copy to another structure having the same member. How do I do that?
Copying by plain assignment is best, since it's shorter, easier to read, and has a higher level of abstraction. Instead of saying (to the human reader of the code) "copy these bits from here to there", and requiring the reader to think about the size argument to the copy, you're just doing a plain assignment ("copy this value from here to here"). There can be no hesitation about whether or not the size is correct.
Also, if the structure is heavily padded, assignment might make the compiler emit something more efficient, since it doesn't have to copy the padding (and it knows where it is), but mempcy() doesn't so it will always copy the exact number of bytes you tell it to copy.
If your string is an actual array, i.e.:
struct {
char string[32];
size_t len;
} a, b;
strcpy(a.string, "hello");
a.len = strlen(a.string);
Then you can still use plain assignment:
b = a;
To get a complete copy. For variable-length data modelled like this though, this is not the most efficient way to do the copy since the entire array will always be copied.
Beware though, that copying structs that contain pointers to heap-allocated memory can be a bit dangerous, since by doing so you're aliasing the pointer, and typically making it ambiguous who owns the pointer after the copying operation.
For these situations a "deep copy" is really the only choice, and that needs to go in a function.
Since C90, you can simply use:
dest_struct = source_struct;
as long as the string is memorized inside an array:
struct xxx {
char theString[100];
};
Otherwise, if it's a pointer, you'll need to copy it by hand.
struct xxx {
char* theString;
};
dest_struct = source_struct;
dest_struct.theString = malloc(strlen(source_struct.theString) + 1);
strcpy(dest_struct.theString, source_struct.theString);
If the structures are of compatible types, yes, you can, with something like:
memcpy (dest_struct, source_struct, sizeof (*dest_struct));
The only thing you need to be aware of is that this is a shallow copy. In other words, if you have a char * pointing to a specific string, both structures will point to the same string.
And changing the contents of one of those string fields (the data that the char * points to, not the char * itself) will change the other as well.
If you want a easy copy without having to manually do each field but with the added bonus of non-shallow string copies, use strdup:
memcpy (dest_struct, source_struct, sizeof (*dest_struct));
dest_struct->strptr = strdup (source_struct->strptr);
This will copy the entire contents of the structure, then deep-copy the string, effectively giving a separate string to each structure.
And, if your C implementation doesn't have a strdup (it's not part of the ISO standard), get one from here.
You can memcpy structs, or you can just assign them like any other value.
struct {int a, b;} c, d;
c.a = c.b = 10;
d = c;
In C, memcpy is only foolishly risky. As long as you get all three parameters exactly right, none of the struct members are pointers (or, you explicitly intend to do a shallow copy) and there aren't large alignment gaps in the struct that memcpy is going to waste time looping through (or performance never matters), then by all means, memcpy. You gain nothing except code that is harder to read, fragile to future changes and has to be hand-verified in code reviews (because the compiler can't), but hey yeah sure why not.
In C++, we advance to the ludicrously risky. You may have members of types which are not safely memcpyable, like std::string, which will cause your receiving struct to become a dangerous weapon, randomly corrupting memory whenever used. You may get surprises involving virtual functions when emulating slice-copies. The optimizer, which can do wondrous things for you because it has a guarantee of full type knowledge when it compiles =, can do nothing for your memcpy call.
In C++ there's a rule of thumb - if you see memcpy or memset, something's wrong. There are rare cases when this is not true, but they do not involve structs. You use memcpy when, and only when, you have reason to blindly copy bytes.
Assignment on the other hand is simple to read, checks correctness at compile time and then intelligently moves values at runtime. There is no downside.
You can use the following solution to accomplish your goal:
struct student
{
char name[20];
char country[20];
};
void main()
{
struct student S={"Wolverine","America"};
struct student X;
X=S;
printf("%s%s",X.name,X.country);
}
You can use a struct to read write into a file.
You do not need to cast it as a `char*.
Struct size will also be preserved.
(This point is not closest to the topic but guess it:
behaving on hard memory is often similar to RAM one.)
To move (to & from) a single string field you must use strncpy
and a transient string buffer '\0' terminating.
Somewhere you must remember the length of the record string field.
To move other fields you can use the dot notation, ex.:
NodeB->one=intvar;
floatvar2=(NodeA->insidebisnode_subvar).myfl;
struct mynode {
int one;
int two;
char txt3[3];
struct{char txt2[6];}txt2fi;
struct insidenode{
char txt[8];
long int myl;
void * mypointer;
size_t myst;
long long myll;
} insidenode_subvar;
struct insidebisnode{
float myfl;
} insidebisnode_subvar;
} mynode_subvar;
typedef struct mynode* Node;
...(main)
Node NodeA=malloc...
Node NodeB=malloc...
You can embed each string into a structs that fit it,
to evade point-2 and behave like Cobol:
NodeB->txt2fi=NodeA->txt2fi
...but you will still need of a transient string
plus one strncpy as mentioned at point-2 for scanf, printf
otherwise an operator longer input (shorter),
would have not be truncated (by spaces padded).
(NodeB->insidenode_subvar).mypointer=(NodeA->insidenode_subvar).mypointer
will create a pointer alias.
NodeB.txt3=NodeA.txt3
causes the compiler to reject:
error: incompatible types when assigning to type ‘char[3]’ from type ‘char *’
point-4 works only because NodeB->txt2fi & NodeA->txt2fi belong to the same typedef !!
A correct and simple answer to this topic I found at
In C, why can't I assign a string to a char array after it's declared?
"Arrays (also of chars) are second-class citizens in C"!!!

Dynamically create an array of TYPE in C

I've seen many posts for c++/java, but nothing for C. Is it possible to allocate memory for an array of type X dynamically during run time? For example, in pseudo,
switch(data_type)
case1:float, create a new array of floats to use in the rest of the program
case2:int, create new array of ints to use in the rest of the program
case3:unsigned, ....
// etc.
In my program I determine the data type from a text header file during run time, and then I need to create an appropriate array to store/manipulate data. Is there some kind of generic type in C?
EDIT: I need to dynamically create and DECIDE which array should be created.
Thanks,
csand
Assuming you calculate the total size, in bytes, required from the array, you can just allocate that much memory and assign it to the correct pointer type.
Ex:
void * data_ptr = malloc( data_sz );
then you can assign it to a pointer for whatever type you want:
int *array1 = (int *)data_ptr;
or
float *array2 = (float *)data_ptr;
NOTE: malloc allocates memory on the heap, so it will not be automatically freed. Make sure you free the memory you allocate at some point.
UPDATE
enum {
DATA_TYPE_INT,
DATA_TYPE_FLOAT,
...
};
typedef struct {
int data_type;
union {
float * float_ptr;
int * int_ptr;
...
} data_ptr;
} data;
While this might allow you to store the pointer and tell what type of pointer you should be using, it still leaves the problem of not having to branch the behavior depending on the data type. That will be difficult because the compiler has to know the data type for assignments etc.
You're going to have a hard time doing this in C because C is statically typed and has no run-time type information. Every line of C code has to know exactly what type it is dealing with.
However, C comes with a nifty and much-abused macro preprocessor that lets you (among other things) define new functions that differ only in the static type. For example:
#define FOO_FUNCTION(t) t foo_function_##t(t a, t b) { return a + b; }
FOO_FUNCTION(int)
FOO_FUNCTION(float)
This gets you 2 functions, foo_function_int and foo_function_float, which are identical other than the name and type signature. If you're not familiar with the C preprocessor, be warned it has all sorts of fun gotchas, so read up on it before embarking on rewriting chunks of your program as macros.
Without knowing what your program looks like, I don't know how feasible this approach will be for you, but often the macro preprocessor can help you pretend that you're using a language that supports generic programming.

a few beginner C questions

I'm sort of learning C, I'm not a beginner to programming though, I "know" Java and python, and by the way I'm on a mac (leopard).
Firstly,
1: could someone explain when to use a pointer and when not to?
2:
char *fun = malloc(sizeof(char) * 4);
or
char fun[4];
or
char *fun = "fun";
And then all but the last would set indexes 0, 1, 2 and 3 to 'f', 'u', 'n' and '\0' respectively. My question is, why isn't the second one a pointer? Why char fun[4] and not char *fun[4]? And how come it seems that a pointer to a struct or an int is always an array?
3:
I understand this:
typedef struct car
{
...
};
is a shortcut for
struct car
{
...
};
typedef struct car car;
Correct? But something I am really confused about:
typedef struct A
{
...
}B;
What is the difference between A and B? A is the 'tag-name', but what's that? When do I use which? Same thing for enums.
4. I understand what pointers do, but I don't understand what the point of them is (no pun intended). And when does something get allocated on the stack vs. the heap? How do I know where it gets allocated? Do pointers have something to do with it?
5. And lastly, know any good tutorial for C game programming (simple) ? And for mac/OS X, not windows?
PS. Is there any other name people use to refer to just C, not C++? I hate how they're all named almost the same thing, so hard to try to google specifically C and not just get C++ and C# stuff.
Thanks!!
It was hard to pick a best answer, they were all great, but the one I picked was the only one that made me understand my 3rd question, which was the only one I was originally going to ask. Thanks again!
My question is, why isn't the second one a pointer?
Because it declares an array. In the two other cases, you have a pointer that refers to data that lives somewhere else. Your array declaration, however, declares an array of data that lives where it's declared. If you declared it within a function, then data will die when you return from that function. Finally char *fun[4] would be an array of 4 pointers - it wouldn't be a char pointer. In case you just want to point to a block of 4 chars, then char* would fully suffice, no need to tell it that there are exactly 4 chars to be pointed to.
The first way which creates an object on the heap is used if you need data to live from thereon until the matching free call. The data will survive a return from a function.
The last way just creates data that's not intended to be written to. It's a pointer which refers to a string literal - it's often stored in read-only memory. If you write to it, then the behavior is undefined.
I understand what pointers do, but I don't understand what the point of them is (no pun intended).
Pointers are used to point to something (no pun, of course). Look at it like this: If you have a row of items on the table, and your friend says "pick the second item", then the item won't magically walk its way to you. You have to grab it. Your hand acts like a pointer, and when you move your hand back to you, you dereference that pointer and get the item. The row of items can be seen as an array of items:
And how come it seems that a pointer to a struct or an int is always an array?
item row[5];
When you do item i = row[1]; then you first point your hand at the first item (get a pointer to the first one), and then you advance till you are at the second item. Then you take your hand with the item back to you :) So, the row[1] syntax is not something special to arrays, but rather special to pointers - it's equivalent to *(row + 1), and a temporary pointer is made up when you use an array like that.
What is the difference between A and B? A is the 'tag-name', but what's that? When do I use which? Same thing for enums.
typedef struct car
{
...
};
That's not valid code. You basically said "define the type struct car { ... } to be referable by the following ordinary identifier" but you missed to tell it the identifier. The two following snippets are equivalent instead, as far as i can see
1)
struct car
{
...
};
typedef struct car car;
2)
typedef struct car
{
...
} car;
What is the difference between A and B? A is the 'tag-name', but what's that? When do I use which? Same thing for enums.
In our case, the identifier car was declared two times in the same scope. But the declarations won't conflict because each of the identifiers are in a different namespace. The two namespaces involved are the ordinary namespace and the tag namespace. A tag identifier needs to be used after a struct, union or enum keyword, while an ordinary identifier doesn't need anything around it. You may have heard of the POSIX function stat, whose interface looks like the following
struct stat {
...
};
int stat(const char *path, struct stat *buf);
In that code snippet, stat is registered into the two aforementioned namespaces too. struct stat will refer to the struct, and merely stat will refer to the function. Some people don't like to precede identifiers always with struct, union or enum. Those use typedef to introduce an ordinary identifier that will refer to the struct too. The identifier can of course be the same (both times car), or they can differ (one time A the other time B). It doesn't matter.
3) It's bad style to use two different names A and B:
typedef struct A
{
...
} B;
With that definition, you can say
struct A a;
B b;
b.field = 42;
a.field = b.field;
because the variables a and b have the same type. C programmers usually say
typedef struct A
{
...
} A;
so that you can use "A" as a type name, equivalent to "struct A" but it saves you a lot of typing.
Use them when you need to. Read some more examples and tutorials until you understand what pointers are, and this ought to be a lot clearer :)
The second case creates an array in memory, with space for four bytes. When you use that array's name, you magically get back a pointer to the first (index 0) element. And then the [] operator then actually works on a pointer, not an array - x[y] is equivalent to *(x + y). And yes, this means x[y] is the same as y[x]. Sorry.
Note also that when you add an integer to a pointer, it's multiplied by the size of the pointed-to elements, so if you do someIntArray[1], you get the second (index 1) element, not somewhere inbetween starting at the first byte.
Also, as a final gotcha - array types in function argument lists - eg, void foo(int bar[4]) - secretly get turned into pointer types - that is, void foo(int *bar). This is only the case in function arguments.
Your third example declares a struct type with two names - struct A and B. In pure C, the struct is mandatory for A - in C++, you can just refer to it as either A or B. Apart from the name change, the two types are completely equivalent, and you can substitute one for the other anywhere, anytime without any change in behavior.
C has three places things can be stored:
The stack - local variables in functions go here. For example:
void foo() {
int x; // on the stack
}
The heap - things go here when you allocate them explicitly with malloc, calloc, or realloc.
void foo() {
int *x; // on the stack
x = malloc(sizeof(*x)); // the value pointed to by x is on the heap
}
Static storage - global variables and static variables, allocated once at program startup.
int x; // static
void foo() {
static int y; // essentially a global that can only be used in foo()
}
No idea. I wish I didn't need to answer all questions at once - this is why you should split them up :)
Note: formatting looks ugly due to some sort of markdown bug, if anyone knows of a workaround please feel free to edit (and remove this note!)
char *fun = malloc(sizeof(char) * 4);
or
char fun[4];
or
char *fun = "fun";
The first one can be set to any size you want at runtime, and be resized later - you can also free the memory when you are done.
The second one is a pointer really 'fun' is the same as char ptr=&fun[0].
I understand what pointers do, but I don't understand what the point of
them is (no pun intended). And when
does something get allocated on the
stack vs. the heap? How do I know
where it gets allocated? Do pointers
have something to do with it?
When you define something in a function like "char fun[4]" it is defined on the stack and the memory isn't available outside the function.
Using malloc (or new in C++) reserves memory on the heap - you can make this data available anywhere in the program by passing it the pointer. This also lets you decide the size of the memory at runtime and finaly the size of the stack is limited (typically 1Mb) while on the heap you can reserve all the memory you have available.
edit 5. Not really - I would say pure C. C++ is (almost) a superset of C so unless you are working on a very limited embedded system it's usualy OK to use C++.
\5. Chipmunk
Fast and lightweight 2D rigid body physics library in C.
Designed with 2D video games in mind.
Lightweight C99 implementation with no external dependencies outside of the Std. C library.
Many language bindings available.
Simple, read the documentation and see!
Unrestrictive MIT license.
Makes you smarter, stronger and more attractive to the opposite gender!
...
In your second question:
char *fun = malloc(sizeof(char) * 4);
vs
char fun[4];
vs
char *fun = "fun";
These all involve an array of 4 chars, but that's where the similarity ends. Where they differ is in the lifetime, modifiability and initialisation of those chars.
The first one creates a single pointer to char object called fun - this pointer variable will live only from when this function starts until the function returns. It also calls the C standard library and asks it to dynamically create a memory block the size of an array of 4 chars, and assigns the location of the first char in the block to fun. This memory block (which you can treat as an array of 4 chars) has a flexible lifetime that's entirely up to the programmer - it lives until you pass that memory location to free(). Note that this means that the memory block created by malloc can live for a longer or shorter time than the pointer variable fun itself does. Note also that the association between fun and that memory block is not fixed - you can change fun so it points to different memory block, or make a different pointer point to that memory block.
One more thing - the array of 4 chars created by malloc is not initialised - it contains garbage values.
The second example creates only one object - an array of 4 chars, called fun. (To test this, change the 4 to 40 and print out sizeof(fun)). This array lives only until the function it's declared in returns (unless it's declared outside of a function, when it lives for as long as the entire program is running). This array of 4 chars isn't initialised either.
The third example creates two objects. The first is a pointer-to-char variable called fun, just like in the first example (and as usual, it lives from the start of this function until it returns). The other object is a bit strange - it's an array of 4 chars, initialised to { 'f', 'u', 'n', 0 }, which has no name and that lives for as long as the entire program is running. It's also not guaranteed to be modifiable (although what happens if you try to modify it is left entirely undefined - it might crash your program, or it might not). The variable fun is initialised with the location of this strange unnamed, unmodifiable, long-lived array (but just like in the first example, this association isn't permanent - you can make fun point to something else).
The reason why there's so many confusing similarities and differences between arrays and pointers is down to two things:
The "array syntax" in C (the [] operator) actually works on pointers, not arrays!
Trying to pin down an array is a bit like catching fog - in almost all cases the array evaporates and is replaced by a pointer to its first element instead.

How to dynamically create and read structs in C?

How can I do something like that (just an example):
any_struct *my_struct = create_struct();
add_struct_member(my_struct, "a", int_member);
add_struct_member(my_struct, "b", float_member);
So that I could load and use a struct instance "from the outside" (at the address addressOfMyStruct) with the given structure here?
any_struct_instance *instance = instance(my_struct, addressOfMyStruct);
int a = instance_get_member(instance, "a");
float b = instance_get_member(instance, "b");
I would also like to be able to create struct instances dynamically this way.
I hope it's clear what I want to do. I know that C/Invoke is able to do it, but is there a separate library to do that?
Actually demonstrating the code to make this work in C is a bit too involved for an SO post. But explaining the basic concept is doable.
What you're really creating here is a templated property bag system. The one thing you'll need a lot of to keep this going is some assiociative structure like a hash table. I'd say go with std::map but you mentioned this was a C only solution. For the sake of discussion I'm just going to assume you have some sort of hashtable available.
The "create_struct" call will need to return a structure which contains a pointer to a hashtable which makes const char* to essentially a size_t. This map defines what you need in order to create a new instance of the struct.
The "insance" method will essentially create a new hashtable with equal number of members as the template hashtable. Lets throw lazy evualation out the window for a second and assume you create all members up front. The method will need to loop over the template hashtable adding a member for every entry and malloc'ing a memory chunk of the specified size.
The implementation of instance_get_member will simply do a lookup in the map by name. The signature though and usage pattern will need to change though. C does not support templates and must chose a common return type that can represent all data. In this case you'll need to chose void* since that's how the memory will need to be stored.
void* instance_get_member(any_struct_instance* inst, const char* name);
You can make this a bit better by adding an envil macro to simulate templates
#define instance_get_member2(inst, name, type) \
*((type*)instance_get_member((inst),(name)))
...
int i = instance_get_member2(pInst,"a", int);
You've gone so far defining the problem that all that's left is a bit of (slightly tricky in some parts) implementation. You just need to keep track of the information:
typedef struct {
fieldType type;
char name[NAMEMAX];
/* anything else */
} meta_struct_field;
typedef struct {
unsigned num_fields;
meta_struct_field *fields;
/* anything else */
} meta_struct;
Then create_struct() allocates memory for meta_struct and initialized it to 0, and add_struct_member() does an alloc()/realloc() on my_struct.fields and increments my_struct.num_fields. The rest follows in the same vein.
You'll also want a union in meta_struct_field to hold actual values in instances.
I did some of this a long time ago.
The way I did it was to generate code containing the struct definition, plus all routines for accessing it and then compile and link it into a DLL "on the fly", then load that DLL dynamically.

Resources