How does C treat struct assignment - c

Suppose I have a struct like that:
typedef struct {
char *str;
int len;
} INS;
And an array of that struct.
INS *ins[N] = { &item, &item, ... }
When i try to access its elements, not as pointer, but as struct itself, all the fields are copied to a temporary local place?
for (int i = 0; i < N; i++) {
INS in = *ins[i];
// internaly the above line would be like:
// in.str = ins[i]->str;
// in.len = ins[i]->len;
}
?
So as I increase the structure fields that would be a more expensive assignment operation?

Correct, in is a copy of *ins[i].
Never mind your memory consumption, but your code will most likely not be correct: The object in dies at the end of the loop body, and any changes you make to in have no lasting effect!

The structure assignment behaves like a memcpy. Yes, it is more expensive for a larger structure. Paradoxically, the larger your structure becomes, the harder it is to measure the additional expense of adding another field.

Yes, struct have value semantics in C. So assigning a struct to another will result in a member-wise copy. Keep in mind that the pointers will still point to the same objects.

The compiler may optimize away the copy of the structure and instead either access members directly from the array to supply the values needed in your C code that uses the copy or may copy just the individual members you use. A good compiler will do this.
Storing values via pointers can interfere with this optimization. For example, suppose your routine also has a pointer to int, p. When the compiler processes your code INS in = *ins[i], it could “think” something like this: “Copying ins[i] is expensive. Instead, I will just remember that in is a copy, and I will fetch members for it later, when they are used.” However, if your code contains *p = 3, this could change ins[i], unless the compiler is able to deduce that p does not point into ins[i]. (There is a way to help the compiler make that deduction, with the restrict keyword.)
In summary: Operations that look expensive on the surface might be implemented efficiently by a good compiler. Operations that look cheap might be expensive (writing to *p breaks a big optimization). Generally, you should write code that clearly expresses your algorithm and let the compiler optimize.
To expand on how the compiler might optimize this. Suppose you write:
for (int i = 0; i < N; i++) {
INS in = *ins[i];
...
}
where the code in “...” accesses in.str and in.len but not any of the other 237 members you add to the INS struct. Then the compiler is free to, in effect, transform this code into:
for (int i = 0; i < N; i++) {
char *str = *ins[i].str;
int len = *ins[i].len;
...
}
That is, even though you wrote a statement that, on the surface, copies all of an INS struct, the compiler is only required to copy the parts that are actually needed. (Actually, it is not even required to copy those parts. It is only required to produce a program that gets the same results as if it had followed the source code directly.)

Related

How to perform efficient vector initialization in Rust?

What's a good way to fill in a vector of structs in Rust where:
The size is dynamic, but known at the time of initialization.
Doesn't first initialize the memory to a dummy value.
Doesn't re-allocate memory as its filled.
In this example, all members of the vector are always initialized.(In keeping with Rusts assurance of no undefined behavior).
And ideally
Doesn't index check each index access(since the size is known when declaring the vector this should be possible).
Doesn't require unsafe(Not sure if this is reasonable, however the compiler _could_ detect that all values are always filled, allowing such logic in an unsafe block).
The C equivalent is:
struct MyStruct *create_mystruct(const uint n) {
struct MyStruct *vector = malloc(sizeof(*vector) * n);
for (uint i = 0; i < n; i++) {
/* any kind of initialization */
initialize_mystruct(&vector[i], i);
}
return vector;
}
I'm porting over some C code which fills an array in a simple loop, so I was wondering if there was a Rustic way to perform such a common task with zero or at least minimal overhead?
If there are typically some extra checks needed for the Rust version of this code, what's the nearest equivalent?
Just use map and collect.
struct MyStruct(usize);
fn create_mystructs(n: usize) -> Vec<MyStruct> {
(0..n).map(MyStruct).collect()
}
"Initializing" doesn't make sense in safe Rust because you'd need to have the ability to access the uninitialized values, which is unsafe. The Iterator::size_hint method can be used when collecting into a container to ensure that a minimum number of allocations is made.
Basically, I'd trust that the optimizer will do the right thing here. If it doesn't, I'd believe that it eventually will.

How to include a variable-sized array as stuct member in C?

I must say, I have quite a conundrum in a seemingly elementary problem. I have a structure, in which I would like to store an array as a field. I'd like to reuse this structure in different contexts, and sometimes I need a bigger array, sometimes a smaller one. C prohibits the use of variable-sized buffer. So the natural approach would be declaring a pointer to this array as struct member:
struct my {
struct other* array;
}
The problem with this approach however, is that I have to obey the rules of MISRA-C, which prohibits dynamic memory allocation. So then if I'd like to allocate memory and initialize the array, I'm forced to do:
var.array = malloc(n * sizeof(...));
which is forbidden by MISRA standards. How else can I do this?
Since you are following MISRA-C, I would guess that the software is somehow mission-critical, in which case all memory allocation must be deterministic. Heap allocation is banned by every safety standard out there, not just by MISRA-C but by the more general safety standards as well (IEC 61508, ISO 26262, DO-178 and so on).
In such systems, you must always design for the worst-case scenario, which will consume the most memory. You need to allocate exactly that much space, no more, no less. Everything else does not make sense in such a system.
Given those pre-requisites, you must allocate a static buffer of size LARGE_ENOUGH_FOR_WORST_CASE. Once you have realized this, you simply need to find a way to keep track of what kind of data you have stored in this buffer, by using an enum and maybe a "size used" counter.
Please note that not just malloc/calloc, but also VLAs and flexible array members are banned by MISRA-C:2012. And if you are using C90/MISRA-C:2004, there are no VLAs, nor are there any well-defined use of flexible array members - they invoked undefined behavior until C99.
Edit: This solution does not conform to MISRA-C rules.
You can kind of include VLAs in a struct definition, but only when it's inside a function. A way to get around this is to use a "flexible array member" at the end of your main struct, like so:
#include <stdio.h>
struct my {
int len;
int array[];
};
You can create functions that operate on this struct.
void print_my(struct my *my) {
int i;
for (i = 0; i < my->len; i++) {
printf("%d\n", my->array[i]);
}
}
Then, to create variable length versions of this struct, you can create a new type of struct in your function body, containing your my struct, but also defining a length for that buffer. This can be done with a varying size parameter. Then, for all the functions you call, you can just pass around a pointer to the contained struct my value, and they will work correctly.
void create_and_use_my(int nelements) {
int i;
// Declare the containing struct with variable number of elements.
struct {
struct my my;
int array[nelements];
} my_wrapper;
// Initialize the values in the struct.
my_wrapper.my.len = nelements;
for (i = 0; i < nelements; i++) {
my_wrapper.my.array[i] = i;
}
// Print the struct using the generic function above.
print_my(&my_wrapper.my);
}
You can call this function with any value of nelements and it will work fine. This requires C99, because it does use VLAs. Also, there are some GCC extensions that make this a bit easier.
Important: If you pass the struct my to another function, and not a pointer to it, I can pretty much guarantee you it will cause all sorts of errors, since it won't copy the variable length array with it.
Here's a thought that may be totally inappropriate for your situation, but given your constraints I'm not sure how else to deal with it.
Create a large static array and use this as your "heap":
static struct other heap[SOME_BIG_NUMBER];
You'll then "allocate" memory from this "heap" like so:
var.array = &heap[start_point];
You'll have to do some bookkeeping to keep track of what parts of your "heap" have been allocated. This assumes that you don't have any major constraints on the size of your executable.

Can I eliminate the usage of pointer during static analysis in this way?

So basically I am using CIL (writing some CIL extension) to simplify some C code, and what I am trying to do is eliminate the usage of pointer(Because the usage of pointer could cause a lot of troubles in our next step analysis)
Here is the code:
void foo()
{
int a = 1;
int* p = &a;
int c = *p;
*p = 3;
}
I am thinking that to maintain a map of pointer reference relation in my CIL extension, and the simplified C code could be:
void foo()
{
int a = 1;
// int *p = &a; // map: {p : a} just eliminate this code, and create new entry in map
int c = a; // substitute based on map
a = 3; // substitute based on map
}
This is the most easy situation, and it looks promising.
But things could turn to complicated, for example return a pointer(then have to change the type of this function, of course it is also straightforward in CIL)
So my questions are :
Is it universally doable to eliminate pointer in this way?
Is there any undecidable situation?
I'm not sure about what you are actually trying to achieve, but eliminating all pointer uses is next to impossible. You may be able to eliminate some uses (like the rather trivial code in your example), but you will most likely not be able to eliminate all, simply because indirection is used in so many interesting ways.
Take for example a linked list. If you try to eliminate the pointers linking the list, you would need to give a name to each list node that is ever created by the program. Of course, you can replace the list by an array (which involves pointer arithmetic of its own), but that won't help you with a binary tree. And that is not the end of it, there are hash tables, virtually arbitrary linking between objects that results from object oriented programming, etc.

Typecasting of pointers in C

I know a pointer to one type may be converted to a pointer of another type. I have three questions:
What should kept in mind while typecasting pointers?
What are the exceptions/error may come in resulting pointer?
What are best practices to avoid exceptions/errors?
A program well written usually does not use much pointer typecasting. There could be a need to use ptr typecast for malloc for instance (declared (void *)malloc(...)), but it is not even necessary in C (while a few compilers may complain).
int *p = malloc(sizeof(int)); // no need of (int *)malloc(...)
However in system applications, sometimes you want to use a trick to perform binary or specific operation - and C, a language close to the machine structure, is convenient for that. For instance say you want to analyze the binary structure of a double (that follows thee IEEE 754 implementation), and working with binary elements is simpler, you may declare
typedef unsigned char byte;
double d = 0.9;
byte *p = (byte *)&d;
int i;
for (i=0 ; i<sizeof(double) ; i++) { ... work with b ... }
You may also use an union, this is an exemple.
A more complex utilisation could be the simulation of the C++ polymorphism, that requires to store the "classes" (structures) hierarchy somewhere to remember what is what, and perform pointer typecasting to have, for instance, a parent "class" pointer variable to point at some time to a derived class (see the C++ link also)
CRectangle rect;
CPolygon *p = (CPolygon *)&rect;
p->whatami = POLY_RECTANGLE; // a way to simulate polymorphism ...
process_poly ( p );
But in this case, maybe it's better to directly use C++!
Pointer typecast is to be used carefully for well determined situations that are part of the program analysis - before development starts.
Pointer typecast potential dangers
use them when it's not necessary - that is error prone and complexifies the program
pointing to an object of different size that may lead to an access overflow, wrong result...
pointer to two different structures like s1 *p = (s1 *)&s2; : relying on their size and alignment may lead to an error
(But to be fair, a skilled C programmer wouldn't commit the above mistakes...)
Best practice
use them only if you do need them, and comment the part well that explains why it is necessary
know what you are doing - again a skilled programmer may use tons of pointer typecasts without fail, i.e. don't try and see, it may work on such system / version / OS, and may not work on another one
In plain C you can cast any pointer type to any other pointer type. If you cast a pointer to or from an uncompatible type, and incorrectly write the memory, you may get a segmentation fault or unexpected results from your application.
Here is a sample code of casting structure pointers:
struct Entity {
int type;
}
struct DetailedEntity1 {
int type;
short val1;
}
struct DetailedEntity2 {
int type;
long val;
long val2;
}
// random code:
struct Entity* ent = (struct Entity*)ptr;
//bad:
struct DetailedEntity1* ent1 = (struct DetailedEntity1*)ent;
int a = ent->val; // may be an error here, invalid read
ent->val = 117; // possible invali write
//OK:
if (ent->type == DETAILED_ENTITY_1) {
((struct DetailedEntity1*)ent)->val1;
} else if (ent->type == DETAILED_ENTITY_2) {
((struct DetailedEntity2*)ent)->val2;
}
As for function pointers - you should always use functions which exactly fit the declaration. Otherwise you may get unexpected results or segfaults.
When casting from pointer to pointer (structure or not) you must ensure that the memory is aligned in the exact same way. When casting entire structures the best way to ensure it is to use the same order of the same variables at the start, and differentiating structures only after the "common header". Also remember, that memory alignment may differ from machine to machine, so you can't just send a struct pointer as a byte array and receive it as byte array. You may experience unexpected behaviour or even segfaults.
When casting smaller to larger variable pointers, you must be very careful. Consider this code:
char* ptr = malloc (16);
ptr++;
uint64_t* uintPtr = ptr; // may cause an error, memory is not properly aligned
And also, there is the strict aliasing rule that you should follow.
You probably need a look at ... the C-faq maintained by Steve Summit (which used to be posted in the newsgroups, which means it was read and updated by a lot of the best programmers at the time, sometimes the conceptors of the langage itself).
There is an abridged version too, which is maybe more palatable and still very, very, very, very useful. Reading the whole abridged is, I believe, mandatory if you use C.

C memory issue with char*

I need help with my C code. I have a function that sets a value to the spot in memory
to the value that you have input to the function.
The issue that I am facing is if the pointer moves past the allocated amount of memory
It should throw an error. I am not sure how to check for this issue.
unsignded char the_pool = malloc(1000);
char *num = a pointer to the start of the_pool up to ten spots
num[i] = val;
num[11] = val; //This should throw an error in my function which
So how can I check to see that I have moved into unauthorized memory space.
C will not catch this error for you. You must do it yourself.
For example, you could safely wrap access to your array in a function:
typedef struct
{
char *data;
int length;
} myArrayType;
void MakeArray( myArrayType *p, int length )
{
p->data = (char *)malloc(length);
p->length = length;
}
int WriteToArrayWithBoundsChecking( myArrayType *p, int index, char value )
{
if ( index >= 0 && index < p->length )
{
p->data[index] = value;
return 1; // return "success"
}
else
{
return 0; // return "failure"
}
}
Then you can look at the return value of WriteToArrayWithBoundsChecking() to see if your write succeeded or not.
Of course you must remember to clean up the memory pointed at by myArrayType->data when you are done. Otherwise you will cause a leak.
dont you mean?
num[11] = val
Yes there is no way to check that it is beyond bounds except doing it yourself, C provides no way to do this. Also note that arrays start at zero so num[10] is also beyond bounds.
The standard defines this as Undefined behavior.
It might work, it might not, you never know, when coding in C/C++, make sure you check for bounds before accessing your arrays
Common C compilers will not perform array bounds checking for you.
Some compilers are available that claim to support array bounds -- but their performance is usually poor enough compared to the normal compilers that they are usually not distributed far and wide.
There are even dialects of C intended to provide memory safety, but again, these usually do not get very far. (The linked Cyclone, for example, only supports 32 bit platforms, last time I looked into it.)
You may build your own datastructures to provide bounds checking if you wish. If you maintain a structure that includes a pointer to the start of your data, a data member that includes the allocated size, and functions that work on the structure, you can implement all this. But the onus is entirely on you or your environment to provide these datastructures.
I guess you could use sizeof to avoid your array access out of bound index. But c allows you to access some memory out of your array bound. That's OK for c compiler, and OS will manage the behavior when you do that.
C/C++ doesn't actually do any boundary checking with regards to arrays. It depends on the OS to ensure that you are accessing valid memory.
You could use array like this:
type name[size];
if you are using the Visual Studio 2010 ( or 2011 Beta ) it will till you after u try to free the allocated memory.
there is advanced tools to check for leaked memory.
in you example, you have actually moved to unauthorized memory space indeed. your indexes should be between 0 to ( including ) 999.

Resources