Proper way of using simple integer and memsizes - c

I would like to know what is the proper way of using simple integer and memsize?To be precise,
I have a C code initially written for 32 bit architecture. Now it has to run into both the architecture, So there is obvious reason to get the following warning,while running in 64 bit architecture
warning: cast to pointer from integer of different size
I am trying to remove those warnings using the memsize, intptr_t and uintptr_t. But I have a doubt if it works properly if we use mixed simple integer and memsizes. I would like to know the proper way of using it. Following is the sample of code.
compllits = list_Cons((POINTER) predindex, compllits);
Here compllits is a linked list and is defined as pointer . list_Cons returns pointer. list_Cons is defined as:
list_Cons(POINTER x, LIST y);
And, int preindex. I am casting the integer into Pointer. As I run it in 64-bit machine , I will get the warning
: warning: cast to pointer from integer of different size
Now to resolve this warning, I am liitle bit confused in the two methods I am using ,
Method 1: changing the int preindex into intptr_t preindex.
Method 2. Keeping int preindex unchanged but doing following
compllits = list_Cons((POINTER)(intptr_t)predindex, compllits);
Both the ways are working. But I am not sure which method is legal and best?
Looking for some suggestions.
Thanks

The big question is if you really have to mix pointers and integers. (The few cases where this is the case is when handling lisp-like generic data structures.) If not, you should use the correct type, and that type only.
However, if this is the case, do you really need to handle them using the same function? For example, you could have list_Cons_pointer and list_Cons_int that accept a real pointer and an integer type matching preindexed, respectively.
Whether or not you should change the type of preindexed really depends on what it represents in the program.
Apart from this, an intptr_t is guaranteed to be large enough to hold a pointer, but it might be larger. This means that there is really no way to get rid of all warnings in all possible environments (think 48 bit pointers...)

Is preindex really a pointer? If so then your problem is using int as a pointer type. Use int *.
Also, I'd recommend using int * rather than intptr_t. intptr_t is an integer that is wide enough to hold a pointer, but semantically it is still an integer.
On a 32bit machine, int is 32 bits wide and int * is also 32 bits wide. On a 64 bit machine int is still 32 bits wide, but int * is 64 bits wide.

Related

Assigning a short to int * fails

I understand that I can reassign a variable to a bigger type if it fits, ad its ok to do it. For example:
short s = 2;
int i = s;
long l = i;
long long ll = l;
When I try to do it with pointers it fails and I don't understand why. I have integers that I pass as arguments to functions expecting a pointer to a long long. And it hasn't failed, yet..
The other day I was going from short to int, and something weird happens, I hope someone can I explain it to me. This would be the minimal code to reproduce.
short s = 2;
int* ptr_i = &s; // here ptr_i is the pointer to s, ok , but *ptr_i is definitely not 2
When I try to do it with pointers it fails and I don't understand why.
A major purpose of the type system in C is to reduce programming mistakes. A default conversion may be disallowed or diagnosed because it is symptomatic of a mistake, not because the value cannot be converted.
In int *ptr_i = &s;, &s is the address of a short, typically a 16-bit integer. If ptr_i is set to point to the same memory and *ptr_i is used, it attempts to refer to an int at that address, typically a 32-bit integer. This is generally an error; loading a 32-bit integer from a place where there is a 16-bit integer, and we do not know what is beyond it, is not usually a desired operation. The C standard does not define the behavior when this is attempted.
In fact, there are multiple things that can go wrong with this:
As described above, using *ptr_i when we only know there is a short there may produce undesired results.
The short object may have alignment that is not suitable for an int, which can cause a problem either with the pointer conversion or with using the converted pointer.
The C standard does not define the result of converting short * to int * except that, if it is properly aligned for int, the result can be converted back to short * to produce a value equal to the original pointer.
Even if short and int are the same width, say 32 bits, and the alignment is good, the C standard has rules about aliasing that allow the compiler to assume that an int * never accesses an object that was defined as short. In consequence, optimization of your program may transform it in unexpected ways.
I have integers that I pass as arguments to functions expecting a pointer to a long long.
C does allow default conversions of integers to integers that are the same width or wider, because these are not usually mistakes.

should pointers be signed or unsigned in c

I have a function get_picture() that takes a picture. It returns a pointer of type uint8_t (where the pciture is stored) and takes a pointer to a variable
that stores the length of the picture.
Here is the declaration:
uint8_t * get_picture(int *piclength)
Here I call it in main():
unsigned int address, value;
address = (unsigned int)get_picture((int*)& value);
My question is - becuase address is storing an address (which is positive) should I actually define it as an int.
I'm not sure you understand pointers.
If your function returns a uint8_t * then you should be storing it in uint8_t * not an int.
As an example:
uint8_t* get_picture(int* piclength);
int piclength;
uint8_t* address;
address = get_picture(&piclength);
If you really want to convert a data-pointer to an integer, use the dedicated typedef instead of some random (and possibly too small) type:
uintptr_t / intptr_t (Optional typedefs in <stdint.h>)
Still, the need is rare, and I don't see it here.
It depends on what you are really after. Your code is fine as is if you want address to contain the address of where that picture lives. Likewise you could use an int, since bits is bits, int is the same number of bits as unsigned int and whatever consumes address can be fed those bits. It makes more sense as a human to think of addresses as unsigned, but the compiler and hardware don't care, bits is bits.
But depending on what you are doing you may want to as mentioned already, preserve this address using a pointer of the same type. See Dragan's answer.
If you want to "see" the address then it depends on how you want to see it, converting it to an unsigned int is one easy and generic way to do it.
Yes, this is very system dependent and the size of int varies by toolchain and target and may or may not completely hold an address for that system, so some masking may be required by the consumer of that variable.
So your code is fine, I think I understand the question. Signed or unsigned is in the eye of the beholder, it is only unsigned or signed for particular specific operations. Addresses are not themselves signed nor unsigned, they are just bits on an address bus. For a sane compiler unsigned int and int are the same size, store the same number of bits so long as this compiler defines them as at least the size of the address that this compiler uses for a pointer, then this will work just fine with int or unsigned int. Int feels a little wrong, unsigned int feels right, but those are human emotions. The hardware doesn't care, so long as the bits dont change on their way to the address bus. Now if for some reason the code we don't see prints this variable as a decimal for example printf("%d\n",address); (why would you printf on a microcontroller?) then it may look strange to humans but will still be the right decimal interpretation of the bit pattern than is the address. printf("0x%X\n",address); would make more sense and be more generic. if your printf supports it you could just printf("%p",address); using Dragan's uint8_t * address declaration, which is what many folks here are probably thinking based on classical C training. vs bits are bits and have no meaning whatsoever to the hardware until used, and only for that use case, an address is only an address on the address bus, when doing math on it to compute another address it is not an address it is a bit pattern being fed into the alu, signed or unsigned might depend on the operation (add and subtract dont know signed from unsigned, multiply and divide do).
If you choose to not to use uint8_t * address as a declaration, then unsigned int "feels" better, less likely to mess you up (if you have enough bits in an (unsigned) int for that compiler to store an address in the first place). A signed int feels a little wrong, but technically should work. My rule is only use signed when you specifically need signed, otherwise use unsigned everywhere else, saves on a lot of bugs. Unfortunately traditionally C libraries do it the other way around, making a big mess before the stdint.h stuff came about.

When printf is an address of a variable, why use void*?

I saw some usage of (void*) in printf().
If I want to print a variable's address, can I do it like this:
int a = 19;
printf("%d", &a);
I think, &a is a's address which is just an integer, right?
Many articles I read use something like this:
printf("%p", (void*)&a);
What does %p stand for? (A pointer?)
Why use (void*)? Can't I use (int)&a instead?
Pointers are not numbers. They are often internally represented that way, but they are conceptually distinct.
void* is designed to be a generic pointer type. Any pointer value (other than a function pointer) may be converted to void* and back again without loss of information. This typically means that void* is at least as big as other pointer types.
printfs "%p" format requires an argument of type void*. That's why an int* should be cast to void* in that context. (There's no implicit conversion because it's a variadic function; there's no declared parameter, so the compiler doesn't know what to convert it to.)
Sloppy practices like printing pointers with "%d", or passing an int* to printf with a "%p" format, are things that you can probably get away with on most current systems, but they render your code non-portable. (Note that it's common on 64-bit systems for void* and int to be different sizes, so printing pointers with %d" is really non-portable, not just theoretically.)
Incidentally, the output format for "%p" is implementation-defined. Hexadecimal is common, (in upper or lower case, with or without a leading "0x" or "0X"), but it's not the only possibility. All you can count on is that, assuming a reasonable implementation, it will be a reasonable way to represent a pointer value in human-readable form (and that scanf will understand the output of printf).
The article you read is entirely correct. The correct way to print an int* value is
printf("%p", (void*)&a);
Don't take the lazy way out; it's not at all difficult to get it right.
Suggested reading: Section 4 of the comp.lang.c FAQ. (Further suggested reading: All the other sections.
EDIT:
In response to Alcott's question:
There is still one thing I don't quite understand. int a = 10; int *p = &a;, so p's value is a's address in mem, right? If right, then p's value will range from 0 to 2^32-1 (if cpu is 32-bit), and an integer is 4-byte on 32-bit OS, right? then What's the difference between the p's value and an integer? Can p's value go out of the range?
The difference is that they're of different types.
Assume a system on which int, int*, void*, and float are all 32 bits (this is typical for current 32-bit systems). Does the fact that float is 32 bits imply that its range is 0 to 232-1? Or -231 to 231-1? Certainly not; the range of float (assuming IEEE representation) is approximately -3.40282e+38 to +3.40282e+38, with widely varying resolution across the range, plus exotic values like negative zero, subnormalized numbers, denormalized numbers, infinities, and NaNs (Not-a-Number). int and float are both 32 bits, and you can take the 32 bits of a float object and treat it as an int representation, but the result won't have any straightforward relationship to the value of the float. The second low-order bit of an int, for example, has a specific meaning; it contributes 0 to the value if it's 0, and 2 to the value if it's 1; the corresponding bit of a float has a meaning, but it's quite different (it contributes a value that depends on the value of the exponent).
The situation with pointers is quite similar. A pointer value has a meaning: it's the address of some object (or any of several other things, but we'll set that aside for now). On most current systems, interpreting the bits of a pointer object as if it were an integer gives you something that makes sense on the machine level. But the language itself does not guarantee, or even hint, that that's the case.
Pointers are not numbers.
A concrete example: some years ago, I ran across some code that tried to compute the difference in bytes between two addresses by casting to integers. It was something like this:
unsigned char *p0;
unsigned char *p1;
long difference = (unsigned long)p1 - (unsigned long)p0;
If you assume that pointers are just numbers, representing addresses in a linear monolithic address space, then this code makes sense. But that assumption is not supported by the language. And in fact, there was a system on which that code was intended to run (the Cray T90) on which it simply would not have worked. The T90 had 64-bit pointers pointing to 64-bit words. Byte pointers were synthesized in software by storing an offset in the 3 high-order bits of a pointer object. Subtracting two pointers in the above manner, if they both had 0 offsets, would give you the number of words, not bytes, between the addresses. And if they had non-0 offsets, it would give you meaningless garbage. (Conversion from a pointer to an integer would just copy the bits; it could have done the work to give you a meaningful byte index, but it didn't.)
The solution was simple: drop the casts and use pointer arithmetic:
long difference = p1 - p0;
Other addressing schemes are possible. For example, an address might consist of a descriptor that (perhaps indirectly) references a block of memory, plus an offset within that block.
You can assume that addresses are just numbers, that the address space is linear and monolithic, that all pointers are the same size and have the same representation, that a pointer can be safely converted to int, or to long, and back again without loss of information. And the code you write based on those assumptions will probably work on most current systems. But it's entirely possible that some future systems will again use a different memory model, and your code will break.
If you avoid making any assumptions beyond what the language actually guarantees, your code will be far more future-proof. And even leaving portability issues aside, it will probably be cleaner.
So much insanity present here...
%p is generally the correct format specifier to use if you just want to print out a representation of the pointer. Never, ever use %d.
The length of an int and the length of a pointer (void* or otherwise) have no relationship. Most data models on i386 just happen to have 32-bit ints AND 32-bit pointers -- other platforms, including x86-64, are not the same! (This is also historically known as "all the world's a VAX syndrome".) http://en.wikipedia.org/wiki/64-bit#64-bit_data_models
If for some reason you want to hold a memory address in an integral variable, use the right types! intptr_t and uintptr_t. They're in stdint.h. See http://en.wikipedia.org/wiki/Stdint.h#Integers_wide_enough_to_hold_pointers
In C void * is an un-typed pointer. void does not mean void... it means anything. Thus casting to void * would be the same as casting to "pointer" in another language.
Using (int *)&a should work too... but the stylistic point of saying (void *) is to say -- I don't care about the type -- just that it is a pointer.
Note: It is possible for an implementation of C to cause this construct to fail and still meet the requirements of the standards. I don't know of any such implementations, but it is possible.
Although it the vast majority of C implementations store pointers to all kinds of objects using the same representation, the C Standard does not require that all implementations do so, nor does it even provide any means by which a program which would exploit commonality of representations could test whether an implementation follows the common practice and refuse to run if an implementation doesn't.
If on some particular platform, an int* held a word address, while both char* and void* combine a word address with a word that identifies a byte within a word, passing an int* to a function that is expecting to retrieve a variadic argument of type char* or void* would result in that function trying to fetch more data from the stack (a word address plus the supplemental word) than had been pushed (just the word address). This could cause the system to malfunction in unpredictable ways.
Many compilers for commonplace platforms that use the same representation for all pointers will process an action which passes a non-void pointer precisely the same way as they would process an action which casts the pointer to void* before passing it. They thus have no reason to care about whether the pointer type that is passed as a variadic argument will precisely match the pointer type expected by the recipient. Although the Standard could have specified that such implementations which would have no reason to care about pointer types should behave as though the pointers were cast to void*, the authors of C89 Standard avoided describing anything which wouldn't be common to all conforming compilers. The Standard's terminology for a construct that 99% of implementations should process identically, but 1% would might process unpredictably, is "Undefined Behavior". Implementations may, and often should, extend the semantics of the language by specifying how they will treat such constructs, but that's a Quality of Implementation issue outside the Standard's jurisdiction.

casting a pointer to integer issues warning on 64bit arch

I'm writing a linux kernel module that makes use of the exported symbol open_exec
struct file *open_exec(const char *name)
It returns a pointer, and I can check for an error with the IS_ERR macro:
if (IS_ERR(file))
return file;
During compile time, I get this warning:
warning: return makes integer from pointer without a cast
This is because my function here returns an integer. If I try to cast it:
return (int) file;
I don't get a warning on my 32bit machine, but I do on my 64bit machine:
warning: cast from pointer to integer of different size
This is because the sizeof of an int and a pointer are the same on 32bit, but they differ on a 64bit machine.
Casting it or not, the code appears to work. I'd just like to get rid of the warning.
How do I properly cast a pointer to an integer and get the value I expect, while not getting a compiler warning? The value I expect is essentially an integer listed in include/asm-generic/errno-base.h of the linux kernel code base.
Since I'm only looking at the pointer as if it was an integer in the case where IS_ERR() is true, I can be sure that it does in-fact only hold an integer value.
The PTR_ERR() macro in linux/err.h, which is where IS_ERR() is also defined, converts a pointer that's really an error code into the appropriate type (a long).
You should use something like:
if (IS_ERR(file))
return PTR_ERR(file);
Search for existing uses of PTR_ERR() in the source and you'll see this is a common pattern.
It might be appropriate for your function to return a long rather than an int - but all error codes should be representable in an int.
You can't properly cast a pointer to a type of smaller size, period. You could do some conversion if you were sure of what that pointer stored.
For example, if you know that a pointer has only lowest 32 bits set you can just cast it and use some compiler-specific pragma to suppress the warning. Or if you want to hash the pointer for using in something like a hash table you could xor the upper 32 bits with the lower 32 bits.
This can't be decided without more knowledge of how that int is used later.
Im not sure I get how you sometimes want to return an number from errno-base.h and sometimes a pointer -- how would the receiving function be able to tell the two apart? That being equal, then on Linux GCC,
int is 32bit wide irrespective of whether you are on 32 or 64bit
linux
pointers are 64 bit wide on 64 bit architectures, and 32 bite wide on
32 bit architectures
long are 32bit wide on 32bit architectures and 64 bit wide on 64 bit
architectures.
long long are always 64bit wide
hence on a 64bit architecture casting a pointer to an int means that you will case a 64bit value to a 32bit value, and you can be somewhat sure that you will lose part of the 64bit information from the pointer -- and this is what the compiler warning is all about, as you point out yourself.
If you want to cast from pointer to something 'anonymous' then your choices should be either long, long long or void* -- with the void* being the most portable.
The other alternative is to record it as an offset, that is if you have a large memory area where you want to 'cast' to a 32bit integer, then convert it to something like;
static struct mybigbuffer *globalbuffer;
int cast2int(void*x)
{
return (int)(globalbuffer-(struct mybigbuffer*)x);
}
however that is only work assuming that you know that your your memory will never exceed 2^31 records of globalbuf and that your pointers are assured to align on boundaries etc -- so unless you are 100% sure you know what you are doing, I would not recommended this either -- stick with the long or void* as the safe options.

Bit Shifts on a C Pointer?

I'm in the middle of this C project that I wish to make very memory efficient. In several cases, I am using the void *s of a dynamic array structure I wrote in order to hold bits. I wish to use all 64 (in this case) bits.
I soon realized that you cannot actually do any bit manipulation on a pointer. So my solution was the following:
void *p;
((unsigned long)p) << 4;
((unsigned long)p) & 3;
This gets the job done, but only because on my computer, longs and pointers are equal in size. Will this be the case in all (or most) architectures?
And my real question: Is there a more correct way to do bit manipulation on a pointer? I had thought that this approach was somewhat common in C (packing bits into a void *), but I could be mistaken...
If your compiler supports it, C99's <stdint.h> header provides the intptr_t and uintptr_t types that should be large enough to hold a pointer on your system, but are integers, so you can do bit manipulation. It can't really get much more portable than that, if that's what you're looking for.
If you need to do this kind of manipulation on pointers, you can cast them to intptr_t and uintptr_t, both of which can be found in stdint.h. These are guaranteed to be defined as the platform-specific integer type with enough bits to hold a pointer.
There's also ptrdiff_t in there, if you need something to hold the difference between two pointers.
I think you're trying to solve the wrong problem. The real problem is right here:
I am using the void *s of a dynamic
array structure I wrote in order to
hold bits.
Don't use void pointers to hold bits. Use void pointers to hold pointers. Use unsigned integers to hold bits.
Declare a union of the pointer and a bitfield.

Resources