I have questions about typecasting. This is just a dummy program shown here. The actual code is too big to be posted.
typedef struct abc
{
int a;
}abc_t;
main()
{
abc_t *MY_str;
char *p;
MY_str = (abc_t *)p;
}
Whenever I run the quality analysis check tool, I get a level 2 warning:
Casting to different object pointer type. REFERENCE - ISO:C90-6.3.4 Cast Operators - Semantics <next> Msg(3:3305) Pointer cast to stricter alignment. <next>
Can anyone please tell me how to resolve this issue?
Simple - your static analysis tool (which, btw?) has decided that a char* does not have a particular alignment requirement (it could point anywhere in memory) whereas an abc_t* likely has a word alignment requirement (int must be on a 4/8 byte boundary).
In reality, as the char* is on the stack, it will be word aligned on most architectures. Your tool cannot see this.
In your implementation (and probably many others) each int must be at an address that is divisible by sizeof int, which is often 4.
On the other hand, a char can be at any address.
It's like assigning 3.25 to an int variable. That's also not possible.
So when you have a bad pointer, you will probably get an exception from your machine, and technically this code invokes undefined behavior.
a char* can be aligned on any byte boundary, which means if you cast it to a structure, the alignment requirements of that struct might not be met (such as 16 byte boundaries required for SIMD types).
Your code is invalid C. If you find yourself doing something like this, it's probably the result of a greater misunderstanding. For instance I'm guessing you want to read an abc_t object from a file/socket/etc. and you're used to passing a char pointer to the read/recv/whatever function. Instead you should just declare an object of type abc_t and pass its address to whatever reading function you're using.
Related
Why do we have pointer types? eg
int *ptr;
I know its for type safety, eg to dereference 'ptr', the compiler needs to know that its dereferencing the ptr to type int, not to char or long, etc, but as others outlined here Why to specify a pointer type? , its also because "we should know how many bytes to read. Dereferencing a char pointer would imply taking one byte from memory while for int it could be 4 bytes." That makes sense.
But what if I have something like this:
typedef struct _IP_ADAPTER_INFO {
struct _IP_ADAPTER_INFO* Next;
DWORD ComboIndex;
char AdapterName[MAX_ADAPTER_NAME_LENGTH + 4];
char Description[MAX_ADAPTER_DESCRIPTION_LENGTH + 4];
UINT AddressLength;
BYTE Address[MAX_ADAPTER_ADDRESS_LENGTH];
DWORD Index;
UINT Type;
UINT DhcpEnabled;
PIP_ADDR_STRING CurrentIpAddress;
IP_ADDR_STRING IpAddressList;
IP_ADDR_STRING GatewayList;
IP_ADDR_STRING DhcpServer;
BOOL HaveWins;
IP_ADDR_STRING PrimaryWinsServer;
IP_ADDR_STRING SecondaryWinsServer;
time_t LeaseObtained;
time_t LeaseExpires;
} IP_ADAPTER_INFO, *PIP_ADAPTER_INFO;
PIP_ADAPTER_INFO pAdapterInfo = (IP_ADAPTER_INFO *)malloc(sizeof(IP_ADAPTER_INFO));
What would be the point of declaring the type PIP_ADAPTER_INFO here? After all, unlike the previous example, we've already allocated enough memory for the pointer to point at (using malloc), so isn't defining the type here redundant? We will be reading as much data from memory as there has been allocated.
Also, side note: Is there any difference between the following 4 declarations or is there a best practice?
PIP_ADAPTER_INFO pAdapterInfo = (IP_ADAPTER_INFO *)malloc(sizeof(IP_ADAPTER_INFO));
or
PIP_ADAPTER_INFO pAdapterInfo = (PIP_ADAPTER_INFO)malloc(sizeof(IP_ADAPTER_INFO));
or
IP_ADAPTER_INFO *pAdapterInfo = (IP_ADAPTER_INFO *)malloc(sizeof(IP_ADAPTER_INFO));
or
IP_ADAPTER_INFO *pAdapterInfo = (PIP_ADAPTER_INFO)malloc(sizeof(IP_ADAPTER_INFO));
You’re kind of asking two different questions here - why have different pointer types, and why hide pointers behind typedefs?
The primary reason for distinct pointer types comes from pointer arithmetic - if p points to an object of type T, then the expression p + 1 points to the next object of that type. If p points to an 4-byte int, then p + 1 points to the next int. If p points to a 128-byte struct, then p + 1 points to the next 128-byte struct, and so on. Pointers are abstractions of memory addresses with additional type semantics.
As for hiding pointers behind typedefs...
A number of us (including myself) consider hiding pointers behind typedefs to be bad style if the user of the type still has to be aware of the type’s “pointer-ness” (i.e., if you ever have to dereference it, or if you ever assign the result of malloc/calloc/realloc to it, etc.). If you’re trying to abstract away the “pointer-ness” of something, you need to do it in more than just the declaration - you need to provide a full API that hides all the pointer operations as well.
As for your last question, best practice in C is to not cast the result of malloc. Best practice in C++ is to not use malloc at all.
I think this is more a question of type definition style than of dynamic memory allocation.
Old-school C practice is to describe structs by their tags. You say
struct foo {
...
};
and then
struct foo foovar;
or
struct foo *foopointer = malloc(sizeof(struct foo));
But a lot of people don't like having to type that keyword struct all the time. (I guess I can't fault then; C has always favored terseness, sometimes seemingly just to reduce typing.) So a form using typedef became quite popular (and it either influenced, or was influenced by, C++):
typedef struct {
...
} Foo;
and then
Foo foovar;
or
Foo *foopointer = malloc(sizeof(Foo));
But then, for reasons that are less clear, it became popular to throw the pointerness into the typedef, too, like this:
typedef struct {
...
} Foo, *Foop;
Foop foopointer = malloc(sizeof(*Foop));
But this is all a matter of style and personal preference, in the service of what someone imagines to be clarity or convenience or usefulness. (But of course opinions on clarity and convenience, like opinions on style, can legitimately vary.) I've seen the pointer typedefs disparaged as being a misleading or Microsoftian practice, but I'm not sure I can fault them right now.
You also asked about the casts, and we could also dissect various options for the sizeof call as the argument to malloc.
It doesn't really matter whether you say
Foop foopointer = (Foop)malloc(sizeof(*Foop));
or
Foop foopointer = (Foo *)malloc(sizeof(*Foop));
The first one may be clearer, in that you don't have to go back and check that Foop and Foo * are the same thing. But they're both poor practice in C, and in at least some circles they've been deprecated since the 1990's. Those casts are are considered distracting and unnecessary in straight C -- although of course they're necessary in C++, or I suppose if you're using a C++ compiler to compile C code. (If you were writing straight C++ code, of course, you'd typically use new instead of malloc.)
But then what should you put in the sizeof()? Which is better,
Foop foopointer = malloc(sizeof(*Foop));
or
Foop foopointer = malloc(sizeof(Foo));
Again, the first one can be easier to read, since you don't have to go back and check that Foop and Foo * are the same thing. But by the same token, there's a third form that can be even clearer:
Foop foopointer = malloc(sizeof(*foopointer));
Now you know that, whatever type foopointer points at, you're allocating the right amount of space for it. This idiom works best, though, if it's maximally clear that foopiinter is in fact a pointer that points at some type, meaning that the variants
Foo *foopointer = malloc(sizeof(*foopointer));
or even
struct foo *foopointer = malloc(sizeof(*foopointer));
can be considered clearer still -- and this may be one of the reasons people consider the pointer typedef to be less than perfectly useful.
Bottom line, if you're still with me: If you don't find PIP_ADAPTER_INFO useful, don't use it -- use IP_ADAPTER_INFO (along with explicit *'s when you need them) instead. Someone thought PIP_ADAPTER_INFO might be useful, which is why it's there, but the arguments in favor of its use aren't too compelling.
What is the point of “pointer types” when you dynamically allocate memory?
At least for the example you show there is none.
So the follow up question would be if there were situations where typedefing a pointer made sense.
And the answer is: Yes.
It definitely makes sense if one is in the need of an opaque data type.
A nice example is the pthread_t type which defines a handle to a POSIX thread.
Depending on the implementation it is defined as
typedef struct bla pthread_t;
typedef struct foo * pthread_t;
typedef long pthread_t;
and with this abstracts away the kind of implementation, as it is of no interest to the user, which probably is not the intention with the struct you show in your question.
Why do we have pointer types?
To accommodate architectures where the size and encoding may differ for various types. C ports well to many platforms, even novel ones.
It is not unusual today that pointers to functions have a different size than pointers to objects. An object pointer coverts to a void *, yet a function pointer may not.
A pointer to char need not be the same size as a pointer to an int or union or struct. This is uncommon today. The spec details follow (my emphasis):
A pointer to void shall have the same representation and alignment requirements as a
pointer to a character type. Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All
pointers to structure types shall have the same representation and alignment requirements
as each other. All pointers to union types shall have the same representation and
alignment requirements as each other. Pointers to other types need not have the same
representation or alignment requirements. C11dr §6.2.5 28
I know a pointer to one type may be converted to a pointer of another type. I have three questions:
What should kept in mind while typecasting pointers?
What are the exceptions/error may come in resulting pointer?
What are best practices to avoid exceptions/errors?
A program well written usually does not use much pointer typecasting. There could be a need to use ptr typecast for malloc for instance (declared (void *)malloc(...)), but it is not even necessary in C (while a few compilers may complain).
int *p = malloc(sizeof(int)); // no need of (int *)malloc(...)
However in system applications, sometimes you want to use a trick to perform binary or specific operation - and C, a language close to the machine structure, is convenient for that. For instance say you want to analyze the binary structure of a double (that follows thee IEEE 754 implementation), and working with binary elements is simpler, you may declare
typedef unsigned char byte;
double d = 0.9;
byte *p = (byte *)&d;
int i;
for (i=0 ; i<sizeof(double) ; i++) { ... work with b ... }
You may also use an union, this is an exemple.
A more complex utilisation could be the simulation of the C++ polymorphism, that requires to store the "classes" (structures) hierarchy somewhere to remember what is what, and perform pointer typecasting to have, for instance, a parent "class" pointer variable to point at some time to a derived class (see the C++ link also)
CRectangle rect;
CPolygon *p = (CPolygon *)▭
p->whatami = POLY_RECTANGLE; // a way to simulate polymorphism ...
process_poly ( p );
But in this case, maybe it's better to directly use C++!
Pointer typecast is to be used carefully for well determined situations that are part of the program analysis - before development starts.
Pointer typecast potential dangers
use them when it's not necessary - that is error prone and complexifies the program
pointing to an object of different size that may lead to an access overflow, wrong result...
pointer to two different structures like s1 *p = (s1 *)&s2; : relying on their size and alignment may lead to an error
(But to be fair, a skilled C programmer wouldn't commit the above mistakes...)
Best practice
use them only if you do need them, and comment the part well that explains why it is necessary
know what you are doing - again a skilled programmer may use tons of pointer typecasts without fail, i.e. don't try and see, it may work on such system / version / OS, and may not work on another one
In plain C you can cast any pointer type to any other pointer type. If you cast a pointer to or from an uncompatible type, and incorrectly write the memory, you may get a segmentation fault or unexpected results from your application.
Here is a sample code of casting structure pointers:
struct Entity {
int type;
}
struct DetailedEntity1 {
int type;
short val1;
}
struct DetailedEntity2 {
int type;
long val;
long val2;
}
// random code:
struct Entity* ent = (struct Entity*)ptr;
//bad:
struct DetailedEntity1* ent1 = (struct DetailedEntity1*)ent;
int a = ent->val; // may be an error here, invalid read
ent->val = 117; // possible invali write
//OK:
if (ent->type == DETAILED_ENTITY_1) {
((struct DetailedEntity1*)ent)->val1;
} else if (ent->type == DETAILED_ENTITY_2) {
((struct DetailedEntity2*)ent)->val2;
}
As for function pointers - you should always use functions which exactly fit the declaration. Otherwise you may get unexpected results or segfaults.
When casting from pointer to pointer (structure or not) you must ensure that the memory is aligned in the exact same way. When casting entire structures the best way to ensure it is to use the same order of the same variables at the start, and differentiating structures only after the "common header". Also remember, that memory alignment may differ from machine to machine, so you can't just send a struct pointer as a byte array and receive it as byte array. You may experience unexpected behaviour or even segfaults.
When casting smaller to larger variable pointers, you must be very careful. Consider this code:
char* ptr = malloc (16);
ptr++;
uint64_t* uintPtr = ptr; // may cause an error, memory is not properly aligned
And also, there is the strict aliasing rule that you should follow.
You probably need a look at ... the C-faq maintained by Steve Summit (which used to be posted in the newsgroups, which means it was read and updated by a lot of the best programmers at the time, sometimes the conceptors of the langage itself).
There is an abridged version too, which is maybe more palatable and still very, very, very, very useful. Reading the whole abridged is, I believe, mandatory if you use C.
I can't seem to understand the difference between the following to pointer notations, can someone please guide me?
typedef struct some_struct struct_name;
struct_name this;
char buf[50];
this = *((some_struct *)(buf));
Now I tried to play around a bit and did the above thing like:
struct some_struct * this;
char buf[50];
this=(struct some_struct *)buf;
As far as I am concerned I think both the implementations should generate the same result, Can someone guide me whether there is a difference between the two and if yes can some one point it out?
Thanks.
In your first snippet, this is not a pointer, it's an instance of some_struct. The assignment you made did a shallow copy (i.e. memcpy()) of what's in buf as if it were an instance of some_struct as well.
In the second snippet, this is a pointer, and it's just pointed to the address of buf.
So, basically to sum up, first snippet this is not a pointer and the struct is copied into it. In the second, it's a pointer and assigned to the same memory as buf (i.e. not a copy).
In the second one, "this" will point to the first memory location of "buf". In the first example, you will either get a compiler error (I don't think you can assign structs in C with =, I could be wrong though), or the contents of buf (up to sizeof(struct_name)) will be copied into this, which resides on the stack.
Both approaches have their problems.
alignment: your buf might not be properly aligned for a variable of the structure type. If so this will produce undefined behavior (UB): in the best case it aborts your program, but it may make much worse things than that.
initialization: in the first cases you access uninitialized memory for reading. In the best case that gives you unspecific data, that is some random bytes. In the worst case, char is a signed integer type on your platform and you hit a trap representation for char => UB as above. (Your second case will encounter the same problem, once you try to access the object at the other end of the pointer.)
How to avoid all that:
Always initialize your variables. A simple = { 0 } should do in all cases.
never use char as a generic type for bytes but use unsigned char
never cast a byte buffer of arbitrary alignment to another data type. If needed, do it the other way round, cast a struct object to unsigned char.
I have a linked list, which stores groups of settings for my application:
typedef struct settings {
struct settings* next;
char* name;
char* title;
char* desc;
char* bkfolder;
char* srclist;
char* arcall;
char* incfold;
} settings_row;
settings_row* first_profile = { 0 };
#define SETTINGS_PER_ROW 7
When I load values into this structure, I don't want to have to name all the elements. I would rather treat it like a named array -- the values are loaded in order from a file and placed incrementally into the struct. Then, when I need to use the values, I access them by name.
//putting values incrementally into the struct
void read_settings_file(settings_row* settings){
char* field = settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW);
}
//accessing components by name
void settings_info(settings_row* settings){
printf("Settings 'profile': %s\n", settings.title);
printf("Description: %s\n", settings.desc);
printf("Folder to backup to: %s\n", settings.bkfolder);
}
But I wonder, since these are all pointers (and there will only ever be pointers in this struct), will the compiler add padding to any of these values? Are they guaranteed to be in this order, and have nothing between the values? Will my approach work sometimes, but fail intermittently?
edit for clarification
I realize that the compiler can pad any values of a struct--but given the nature of the struct (a struct of pointers) I thought this might not be a problem. Since the most efficient way for a 32 bit processor to address data is in 32 bit chunks, this is how the compiler pads values in a struct (ie. an int, short, int in a struct will add 2 bytes of padding after the short, to make it into a 32 bit chunk, and align the next int to the next 32 bit chunk). But since a 32 bit processor uses 32 bit addresses (and a 64 bit processor uses 64 bit addresses (I think)), would padding be totally unnecessary since all of the values of the struct (addresses, which are efficient by their very nature) are in ideal 32 bit chunks?
I am hoping some memory-representation / compiler-behavior guru can come shed some light on whether a compiler would ever have a reason to pad these values
Under POSIX rules, all pointers (both function pointers and data pointers) are all required to be the same size; under just ISO C, all data pointers are convertible to 'void *' and back without loss of information (but function pointers need not be convertible to 'void *' without loss of information, nor vice versa).
Therefore, if written correctly, your code would work. It isn't written quite correctly, though! Consider:
void read_settings_file(settings_row* settings)
{
char* field = settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW)
;
}
Let's assume you're using a 32-bit machine with 8-bit characters; the argument is not all that significantly different if you're using 64-bit machines. The assignment to 'field' is all wrong, because settings + 4 is a pointer to the 5th element (counting from 0) of an array of 'settings_row' structures. What you need to write is:
void read_settings_file(settings_row* settings)
{
char* field = (char *)settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW)
;
}
The cast before addition is crucial!
C Standard (ISO/IEC 9899:1999):
6.3.2.3 Pointers
A pointer to void may be converted to or from a pointer to any incomplete or object
type. A pointer to any incomplete or object type may be converted to a pointer to void
and back again; the result shall compare equal to the original pointer.
[...]
A pointer to a function of one type may be converted to a pointer to a function of another
type and back again; the result shall compare equal to the original pointer. If a converted
pointer is used to call a function whose type is not compatible with the pointed-to type,
the behavior is undefined.
In many cases pointers are natural word sizes, so the compiler is unlikely to pad each member, but that doesn't make it a good idea. If you want to treat it like an array you should use an array.
I'm thinking out loud here so there's probably many mistakes but perhaps you could try this approach:
enum
{
kName = 0,
kTitle,
kDesc,
kBkFolder,
kSrcList,
kArcAll,
kIncFold,
kSettingsCount
};
typedef struct settings {
struct settings* next;
char *settingsdata[kSettingsCount];
} settings_row;
Set the data:
settings_row myRow;
myRow.settingsData[kName] = "Bob";
myRow.settingsData[kDescription] = "Hurrrrr";
...
Reading the data:
void read_settings_file(settings_row* settings){
char** field = settings->settingsData;
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW);
}
It's not guaranteed by the C standard. I've a sneaking suspicion, that I don't have time to check right now either way, that it guarantees no padding between the char* fields, i.e. that consecutive fields of the same type in a struct are guaranteed to be layout-compatible with an array of that type. But even if so, you're on your own between the settings* and the first char*, and also between the last char* and the end of the struct. But you could use offsetof to deal with the first issue, and I don't think the second affects your current code.
However, what you want is almost certainly guaranteed by your compiler, which somewhere in its documentation will set out its rules for struct layout, and will almost certainly say that all pointers to data are word sized, and that a struct can be the size of 8 words without additional padding. But if you want to write highly portable code, you have to use only the guarantees in the standard.
The order of fields is guaranteed. I also don't think you'll see intermittent failure - AFAIK the offset of each field in that struct will be consistent for a given implementation (meaning the combination of compiler and platform).
You could assert that sizeof(settings*) == sizeof(char*) and sizeof(settings_row) == sizeof(char*)*8. If both those hold, there is no room for any padding in the struct, since fields are not allowed to "overlap". If you ever hit a platform where they don't hold, you'll find out.
Even so, if you want an array, I'd be inclined to say use an array, with inline accessor functions or macros to get the individual fields. Whether your trick works or not, it's even easier not to think about it at all.
Although not a duplicate, this probably answers your question:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
It's not uncommon for applications to write an entire struct into a file and read it back out again. But this suffers from the possibility that one day the file will need to be read back on another platform, or by another version of the compiler that packs the struct differently. (Although this can be dealt with by specially-written code that understands the original packing format).
Technically, you can rely only on the order; the compiler could insert padding. If different pointers were of different size, or if the pointer size wasn't a natural word size, it might insert padding.
Practically speaking, you could get away with it. I wouldn't recommend it; it's a bad, dirty trick.
You could achieve your goal with another level of indirection (what doesn't that solve?), or by using a temporary array initialized to point to the various members of the structure.
It's not guaranteed, but it will work fine in most cases. It won't be intermittent, it will either work or not work on a particular platform with a particular build. Since you're using all pointers, most compilers won't mess with any padding.
Also, if you wanted to be safer, you could make it a union.
You can't do that the way you are trying. The compiler is allowed to pad any and all members of the struct. I do not believe it is allowed to reorder the fields.
Most compilers have an attribute that can be applied to the struct to pack it (ie to turn it into a collection of tightly packed storage with no padding), but the downside is that this generally affects performance. The packed flag will probably allow you to use the struct the way you want, but it may not be portable across various platforms.
Padding is designed to make field access as efficient as possible on the target architecture. It's best not to fight it unless you have to (ie, the struct goes to a disk or over a network.)
It seems to me that this approach creates more problems than it solves.
When you read this code six months from now, will you still be aware of all the subtleties of how the compiler pads a struct?
Would someone else, who didn't write the code?
If you must use the struct, use it in the canonical way and just write a function which
assigns values to each field separately.
You could also use an array and create macros to give field names to indices.
If you get too "clever" about optimizing your code, you will end up with slower code anyway, since the compiler won't be able to optimize it as well.
I have a C application that I've created in VS2008. I am creating a mock creation function that overrides function references in a struct. However if I try and do this in a straight forward fashion with something like:
void *ptr = &(*env)->GetVersion;
*ptr = <address of new function>
then I get a "error C2100: illegal indirection" error as *ptr, when ptr is a void * seems to be a banned construct. I can get around it by using a int/long pointer as well, mapping that to the same address and modifying the contents of the long pointer:
*structOffsetPointer = &(*env)->GetVersion;
functionPointer = thisGetVersion;
structOffsetPointerAsLong = (long *)structOffsetPointer;
*structOffsetPointerAsLong = (long)functionPointer;
but I am concerned that using long or int pointers will cause problems if I switch between 32 and 64 bit environments.
So is there are easy way to disable this error? Assuming not, is either int or long 64 bits under win64?
Then how about:
void **ptr = (void **) &(*env)->GetVersion;
*ptr = <address of new function>
The right way to do this is to work with the type system, avoid all the casting and declare actual pointers to functions like:
typedef int (*fncPtr)(void);
fncPtr *ptr = &(*env)->GetVersion;
*ptr = NewFunction;
The above assumes GetVersion is of type fncPtr and NewFunction is declared as
int NewFunction(void);
When dereferencing a "void *", you are left with a "void" which is has no size (or really no type for that matter), so it doesn't know how to assign something to it. It is the same as:
void blah = 0xdeadbabe; // let's assume a 32-bit addressing system
To add to my own response and give a solution, I would give it the proper type of a pointer to a function of the type GetVersion is. If GetVersion that your "env" struct field is pointing to is:
int GetVersion();
then you want:
int (**ptr)() = &(*env)->GetVersion;
Last time I played with void* & C under visual studio, VS didn't play nicely.
Here are some information datapoints:
A pointer is always the size of the system word(8/16/32/64)...(unless you have segmented memory, which I'm assuming you don't have). This is because it needs to point to anywhere in the memory space. For a von Neumann machine, a function pointer is going to be the same size as a data pointer, because data and code occupy the same memory space. This is not guaranteed under a Harvard architecture. I'm not familiar enough with Windows Vista to know if it programatically fakes out a Harvard architecture for security reasons.
I personally would not disable this error, just for the sake of letting the compiler do its job.
As mentioned, to store the address of a function in a pointer you should simply not do the indirection.
However, you also talk about being worried about the size of an int type that you might store a pointer into (which generally is not something you want to do unless you have a really good reason to).
If you want to hold a pointer in an int type for some reason, then on Windows a UINT_PTR type (or uintptr_t from stdint.h if you have it) will hold most pointer types (I don't think it's necessarily large enough to hold some pointer-to-member types).