Started by trying to write a small program to translate basic arithmetic into English, I end up building a binary tree(which is inevitably very unbalanced) to represent the order of evaluations.
First, I wrote
struct expr;
typedef struct{
unsigned char entity_flag; /*positive when the oprd
struct represents an entity
---a single digit or a parenthesized block*/
char oprt;
expr * l_oprd;// these two point to the children nodes
expr * r_oprd;
} expr;
However, to efficiently represent single digits, I prefer
typedef struct{
unsigned char entity_flag;
int ival;
} digit;
Since now the "oprd" feild of each "expr" struct may be either of the above struct-s, I now shall modify their types to
void * l_oprd;
void * r_oprd;
Then there comes the "central question":
how can you access members through a void pointer?
please see the follow code
#include<stdio.h>
#include<stdlib.h>
typedef struct {
int i1;
int i2;} s;
main(){
void* p=malloc(sizeof(s));
//p->i1=1;
//p->i2=2;
*(int*)p=1;
*((int *)p+1)=2;
printf("s{i1:%d, i2: %d}\n",*(int*)p,*((int *)p+1));
}
The compiler wouldn't accept the commented version!
Do I have to do it with the cluttered approach above?
please help.
PS: as you have noticed ,each struct-s above possess a field of the name "entity_flag", thus
void * vp;
...(giving some value to vp)
unsigned char flag=vp->entity_flag;
may extract the flag regardless of what void points to, is this allowed in C? or even "safe" in C?
You can't access members through void * pointers. There are ways you could cast it (indeed, you don't even need to state the case explicitly with void *), but even that is the wrong answer.
The correct answer is to use union:
typedef union {
struct{
unsigned char entity_flag; /*positive when the oprd
struct represents an entity
---a single digit or a parenthesized block*/
char oprt;
expr * l_oprd;// these two point to the children nodes
expr * r_oprd;
} expr;
struct{
unsigned char entity_flag;
int ival;
} digit;
} expr;
You then access an expression like this (given a variable expr *e):
e->expr->entity_flag;
And a digit like this:
e->digit->entity_flag;
Any other solution is a nasty hack, IMO, and most of the casting solutions will risk breaking the "strict aliasing" rules that say that the compiler is allowed to assume that two pointers of different types can't reference the same memory.
Edit ...
If you need to be able to inspect the data itself in order to figure out which member of the union is in use, you can.
Basically, If the top-most fields in two structs are declared the same then they will have the same binary representation. This isn't just limited to unions, this is true in general across all binaries compiled for that architecture (if you think about it, this is essential for libraries to work).
In unions it is common to pull those out into a separate struct so that it's obvious what you're doing, although it's not required:
union {
struct {
int ID;
} base;
struct {
int ID;
char *data
} A;
struct {
int ID;
int *numeric_data;
} B;
}
In this scheme, p->base.ID, p->A.ID, p->B.ID are guaranteed to read the same.
Just convert p to the relevant pointer type:
s *a = p;
a->i1 = 42;
a->i2 = 31;
or
((s *) p)->i1 = 42;
((s *) p)->i2 = 31;
you could cast it:
((s*)p)->i1=1;
((s*)p)->i2=2;
I don't see any entity_flag in struct s but if you mean expr the same applies:
unsigned char flag=((expr*)vp)->entity_flag;
If you know the offset at where your struct member is you can do pointer arithmetic and then cast to the appropriate type according to the value of entity_flag.
I would strongly suggest to align both structures in bytes and use the same number of bytes for oprt and digit.
Also, if you only have oprt and digit "types" in your tree you could sacrifice the first bit of precision to flag for digit or oprt and save the space needed for unsigned char entity_flag. If you use a single 4 bytes int var for both oprt and digit and use the first bit to encode the type you can extract a digit by (using the union solution pattern: proposed in the thread)
typedef union {
struct {
int code;
expr * l_expr;
expr * r_expr;
} oprt;
struct {
int val;
} digit;
} expr;
expr *x;
int raw_digit = x->digit.val;
int digit = raw_digit | ((0x4000000 & raw_digit) << 1 ) // preserves sign in 2's complement
x->digit.val = digit | 0x8000000 // assuming MSB==1 means digit
Using union does not necessarily uses more memory for digits. Basically a digit only takes 4 bytes. So every time you need to alloc a digit type expr you can simply call malloc(4), cast the results to *expr, and set the MSB to 1 accordingly. If you encode and decode expr pointers without bugs, you'll never try to reach beyond the 4th bytes of a "digit" type expr ... hopefully. I don't recommend this solution if you need safety ^_^
To check for expr types easily, you can use bitfield inside the union I believe:
typedef union {
struct {
int code;
expr * l_expr;
expr * r_expr;
} oprt;
struct {
int val;
} digit;
struct {
unsigned int is_digit : 1;
int : 31; //unused
} type;
} expr;
Related
For example, I have the following structure:
typedef struct st
{
char x1;
char x2;
...
char y;
}*st_p;
This definition of structure st may vary over time (it is not guaranteed what will be defined inside ... as development goes on).
Is there a way to guarantee that char y is always aligned on – for example – a 256-byte boundary?
I guess the solution is something like adding an array like the following:
typedef struct st
{
char x1;
char x2;
...
char padding[keyword];
char y;
}*st_p;
Is there such a keyword which calculates the total size of all previous members in this structure and which thereby helps resolve the number of bytes to pad?
Since C11, the C language includes the _Alignas attribute,1 with which you can specify a desired (minimum) alignment requirement for any object, including a member of a structure.
So, in your case, you can skip the 'manual' padding and just add that attribute to the char y; declaration:
typedef struct st {
char x1;
char x2;
//...
_Alignas(256) char y;
}*st_p;
1 However, note that the C Standard does not require that an implementation supports so-called "extended alignment" – that is, an alignment requirement that is greater than that of a max_align_t type (typically 8 or 16 bytes). What it does require, though, is that the specified alignment be a positive power of 2 (which is true, in your case).
As an addendum/alternative, I shall also address your direct question:
Is there such a keyword which calculates the total size of all previous members in this structure and which thereby helps resolve the
number of bytes to pad?
The 'keyword' (actually, an operator) that could be used here is sizeof – so long as you put "all previous members" inside an anonymous struct. Here's an example:
#include <stdio.h>
#include <stddef.h>
typedef struct st {
struct inner {
char x1;
char x2;
char x3[714];
//...
};
char padding[256 - sizeof(struct inner) % 256];
char y;
}*st_p;
int main()
{
struct st test;
test.x1 = 'A'; // You can access the anonymous structure members as though
test.x2 = 'B'; // they were 'direct' members of the enclosing structure.
printf("%c %c %zu\n", test.x1, test.x2, offsetof(struct st, y));// Offset is 768
return 0;
}
But note that, in this code, although the offset of y will be a multiple of 256 bytes, that doesn't guarantee that y will be 256-byte aligned. That will only be true in an instance of the st structure that is itself aligned on a 256-byte boundary … and to ensure that, you will need an alignment attribute on the structure itself.
It is possible to use the compiler attributes to tell it to align a particular structure field to a specific bound. For example, if you are using GCC, you can use:
__attribute__((aligned(n)))
Thus, your example would become:
typedef struct st
{
char x1;
char x2;
...
char __attribute__((aligned(256))) y;
}*st_p;
If you have to do this at several places, you can even imagine defining a special type for this, something like:
typedef char __attribute__((aligned(256))) aligned_char;
typedef struct st
{
char x1;
char x2;
...
aligned_char y;
}*st_p;
This is not limited to structures field, you can also apply this to local or global variables.
However, keep in mind this may not work with all compilers as the attributes are binded to the compiler and not the C language.
If you are using MSCV, you may be able to replicate the behavior with
__declspec(align(x))
Could use a union, but there is no .padding[] member. Could also use a check that the offset is correct in case the first struct is larger than desired offset.
#define OFFSET 256 /* Does not need to be a power-of-2 */
typedef struct st {
union {
struct {
char x1;
char x2;
};
struct {
char dummy[OFFSET];
};
};
char y;
} st;
void bar(void) {
st o1;
o1.x1 = '1';
o1.x2 = '2';
o1.y = 'y';
printf("%p\n", (void *) &o1);
printf("%p\n", (void *) &o1.y);
}
_Static_assert(offsetof(struct st, y) == OFFSET, "TBD code");
Sample output
0xffffcac0
0xffffcbc0
This does not insure the alignment is 256, but that the offset of member y is 256 - which I think is OP's true goal. #Adrian Mole
I have a C function which needs a large amount of variables to be passed, so I came to the idea of "packing" them all in a single array (matrix of variables). The point is, these variables are of a very different type, some int, some arrays (strings and vectors), and many of them float. Is there a way to leave unspecified the type of data stored into the matrix? (I unsuccessfully explored the void "data type")
The elements of an array are always of a single type, that's the point.
Collecting variables of multiple types is the job for a structure, i.e. a struct.
This is a quite common way to solve this particular problem. If the structure becomes large, you might find it convenient to pass a pointer to an instance of it, rather than copying the entire thing in the call.
You can use va_list but struct is the best way to do it
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <stddef.h>
enum type {
INT,
FLOAT,
PCHAR,
};
struct any_type {
enum type type_;
union {
int int_;
float float_;
char* pchar_;
};
};
#define MYSIZE 10
void process(size_t size, struct any_type* array)
{
for(int i = 0; i < size; i++) {
switch(array[i].type_) {
case INT :
printf("INT: %d\n", array[i].int_);
break;
case FLOAT :
printf("FLOAT: %f\n", array[i].float_);
break;
case PCHAR :
printf("PCHAR: %s\n", array[i].pchar_);
break;
default:
printf("UNKNOWN TYPE PROVIDED\n");
break;
}
}
}
int main(int argc, char *argv[])
{
struct any_type *array;
array = malloc(MYSIZE*(sizeof(struct any_type)));
array[0].type_ = INT;
array[0].int_ = 10;
array[1].type_ = FLOAT;
array[1].float_ = 2.5;
array[2].type_ = PCHAR;
array[2].pchar_ = "hello char";
process(3, array);
return 0;
}
You can extend type and union as needed. However using nameless unions require -std=c11.
Expanding on my comment above:
Needing to pass a large number of parameters1 to a function can be a sign that there is a problem in your design - your function may be trying to do too many things at once, and you would be better off refactoring it into several smaller functions, each of which only takes a subset of the parameters.
Assuming that's not the case, how are your parameters logically related to each other? Can they be considered attributes of a single data item? For example, a person may be described by the following attributes: surname, given name, birth date, sex. These can be collected together into a single struct type such as
#include <time.h>
struct person {
char *surname;
char *name;
struct tm birthdate; // struct tm defined in time.h
char sex;
};
void write_to( struct person *p )
{
p->surname = strdup( "McGillicuddy" );
p->name = strdup( "Aloysius" );
p->sex = 'M';
p->birthdate.tm_year = 32; // tm_year starts at 1900, so this is 1932
p->birthdate.tm_mon = 11; // december
p->birthdate.tm_day = 1;
};
int main( void )
{
struct person p;
...
write_to( &p );
...
}
Note that members of struct types can themselves be struct types - struct tm is a type defined in time.h that specifies a datetime value using multiple attributes.
Some notes on syntax:
When you want to access a member of a struct instance, use the . operator. When you want to access a member of a struct through a pointer, use the -> operator. In the function write_to, p is a pointer to struct person, so to access each member of p we use ->. The birthdate member is an instance of struct tm, not a pointer, so we use the . operator to access each member of birthdate.
p->m is equivalent to (*p).m.
Like I said in my comment, you should not collect otherwise unrelated items into a struct type just to reduce the number of parameters being passed to a function. They should all be attributes of a more complex type. Some other examples of what I mean:
// A node in a list
struct node {
data_t data; // for some data type data_t;
struct node *next;
struct node *prev;
};
// A street address
struct addr {
char *number; // to handle things like 102A, 102B
char *street;
char *city;
char state[3];
char *zip;
};
It's possible that you're really passing only a couple of distinct data items to your function, each of which is composed of a lot of different attributes. Take a step back and look at your variables and see how they relate to each other.
"Large" depends on context, and of course there are always exceptions to any rule, but in general passing more than 7 distinct, unrelated parameters is a sign you may need to refactor your function into several smaller functions.
For example, for a structure:
struct name{
int a;
char b;
float c;
}g;
g.b='X';
Now I would like to access structure member b using bitwise operators(<<,>> etc.) and change it to 'A'.
Is it possible to access structure members using such operators??
Bitwise operations on struct's doesn't make to much sense because of padding and more importantly it's just killing the purpose of having a struct in the first place. Bitwise operation are as the name said to operate on bit's in variable. Struct variables usually (if they're not packed) will be padded so until you pack them you wouldn't have guarantee where they are to access them but if you want to ask if you can, yes you can, but you would have to cast struct g to let's say 32 bit value then if two variables would be in this space you could use bit operation on this casted value. If it's necessary you can create union from your struct and have raw variable as one union part, and structure as the other option, then you can manipulate bitwise on raw variable.
You can change the data by getting offset of b. I know, this code does not look good.But it serve your purpose.
#include <stdio.h>
struct name
{
int a;
char b;
float c;
}g;
int main()
{
g.b='X';
int y=0;
int offset = &((struct name *)NULL)->b; // Find Offset of b
char *p = (char *)&g;
int i=0;
for(i=0;i<offset;i++)
{
++p;
}
*p=*p|(1<<0);
// *p = 'A';
printf("........%c\n",g.b);
}
there's a way, you have to copy the structure content into a variable (size of your struct), then manipulate the variable and finally copy the variable content back into the struct using memset.
The typical C99 way to extending stuct is something like
struct Base {
int x;
/* ... */
};
struct Derived {
struct Base base_part;
int y;
/* ... */
};
Then we may cast instance of struct Derived * to struct Base * and then access x.
I want to access base elements of struct Derived * obj; directly, for example obj->x and obj->y. C11 provide extended structs, but as explained here we can use this feature only with anonymous definitions. Then how about to write
#define BASE_BODY { \
int x; \
}
struct Base BASE_BODY;
struct Derived {
struct BASE_BODY;
int y;
};
Then I may access Base members same as it's part of Derived without any casts or intermediate members. I can cast Derived pointer to Base pointer if need do.
Is this acceptable? Are there any pitfalls?
There are pitfalls.
Consider:
#define BASE_BODY { \
double a; \
short b; \
}
struct Base BASE_BODY;
struct Derived {
struct BASE_BODY;
short c;
};
On some implementation it could be that sizeof(Base) == sizeof(Derived), but:
struct Base {
double a;
// Padding here
short b;
}
struct Derived {
double a;
short b;
short c;
};
There is no guarantee that at the beginning of the struct memory layout is the same. Therefore you cannot pass this kind of Derived * to function expecting Base *, and expect it to work.
And even if padding would not mess up the layout, there is a still potential problem with trap presenstation:
If again sizeof(Base) == sizeof(Derived), but c ends up to a area which is covered by the padding at the end of Base. Passing pointer of this struct to function which expects Base* and modifies it, might affect padding bits too (padding has unspecified value), thus possibly corrupting c and maybe even creating trap presentation.
I am attempting to learn more about C and its arcane hidden powers, and I attempted to make a sample struct containing a pointer to a void, intended to use as array.
EDIT: Important note: This is for raw C code.
Let's say I have this struct.
typedef struct mystruct {
unsigned char foo;
unsigned int max;
enum data_t type;
void* data;
} mystruct;
I want data to hold max of either unsigned chars, unsigned short ints, and unsigned long ints, the data_t enum contains
values for those 3 cases.
enum Grid_t {gi8, gi16, gi32}; //For 8, 16 and 32 bit uints.
Then I have this function that initializes and allocates one of this structs, and is supposed to return a pointer to the new struct.
mystruct* new(unsigned char foo, unsigned int bar, long value) {
mystruct* new;
new = malloc(sizeof(mystruct)); //Allocate space for the struct.
assert(new != NULL);
new->foo = foo;
new->max = bar;
int i;
switch(type){
case gi8: default:
new->data = (unsigned char *)calloc(new->max, sizeof(unsigned char));
assert(new->data != NULL);
for(i = 0; i < new->max; i++){
*((unsigned char*)new->data + i) = (unsigned char)value;
//Can I do anything with the format new->data[n]? I can't seem
//to use the [] shortcut to point to members in this case!
}
break;
}
return new;
}
The compiler returns no warnings, but I am not too sure about this method. Is it a legitimate way to use pointers?
Is there a better way©?
I missed calling it. like mystruct* P; P = new(0,50,1024);
Unions are interesting but not what I wanted. Since I will have to approach every specific case individually anyway, casting seems as good as an union. I specifically wanted to have much larger 8-bit arrays than 32-bits arrays, so an union doesn't seem to help. For that I'd make it just an array of longs :P
No, you cannot dereference a void* pointer, it is forbidden by the C language standard. You have to cast it to a concrete pointer type before doing so.
As an alternative, depending on your needs, you can also use a union in your structure instead of a void*:
typedef struct mystruct {
unsigned char foo;
unsigned int max;
enum data_t type;
union {
unsigned char *uc;
unsigned short *us;
unsigned int *ui;
} data;
} mystruct;
At any given time, only one of data.uc, data.us, or data.ui is valid, as they all occupy the same space in memory. Then, you can use the appropriate member to get at your data array without having to cast from void*.
What about
typedef struct mystruct
{
unsigned char foo;
unsigned int max;
enum data_t type;
union
{
unsigned char *chars;
unsigned short *shortints;
unsigned long *longints;
};
} mystruct;
That way, there is no need to cast at all. Just use data_t to determine which of the pointers you want to access.
Is type supposed to be an argument to the function? (Don't name this function or any variable new or any C++ programmer who tries to use it will hunt you down)
If you want to use array indices, you can use a temporary pointer like this:
unsigned char *cdata = (unsigned char *)new->data;
cdata[i] = value;
I don't really see a problem with your approach. If you expect a particular size (which I think you do given the name gi8 etc.) I would suggest including stdint.h and using the typedefs uint8_t, uint16_t, and uint32_t.
A pointer is merely an address in the memory space. You can choose to interpret it however you wish. Review union for more information on how you can interpret the same memory location in multiple ways.
casting between pointer types is common in C and C++, and the use of void* implies that you dont want users to accidentally dereference (dereferencing a void* will cause an error, but dereferencing the same pointer when cast to int* will not)