should pointers be signed or unsigned in c - c

I have a function get_picture() that takes a picture. It returns a pointer of type uint8_t (where the pciture is stored) and takes a pointer to a variable
that stores the length of the picture.
Here is the declaration:
uint8_t * get_picture(int *piclength)
Here I call it in main():
unsigned int address, value;
address = (unsigned int)get_picture((int*)& value);
My question is - becuase address is storing an address (which is positive) should I actually define it as an int.

I'm not sure you understand pointers.
If your function returns a uint8_t * then you should be storing it in uint8_t * not an int.
As an example:
uint8_t* get_picture(int* piclength);
int piclength;
uint8_t* address;
address = get_picture(&piclength);

If you really want to convert a data-pointer to an integer, use the dedicated typedef instead of some random (and possibly too small) type:
uintptr_t / intptr_t (Optional typedefs in <stdint.h>)
Still, the need is rare, and I don't see it here.

It depends on what you are really after. Your code is fine as is if you want address to contain the address of where that picture lives. Likewise you could use an int, since bits is bits, int is the same number of bits as unsigned int and whatever consumes address can be fed those bits. It makes more sense as a human to think of addresses as unsigned, but the compiler and hardware don't care, bits is bits.
But depending on what you are doing you may want to as mentioned already, preserve this address using a pointer of the same type. See Dragan's answer.
If you want to "see" the address then it depends on how you want to see it, converting it to an unsigned int is one easy and generic way to do it.
Yes, this is very system dependent and the size of int varies by toolchain and target and may or may not completely hold an address for that system, so some masking may be required by the consumer of that variable.
So your code is fine, I think I understand the question. Signed or unsigned is in the eye of the beholder, it is only unsigned or signed for particular specific operations. Addresses are not themselves signed nor unsigned, they are just bits on an address bus. For a sane compiler unsigned int and int are the same size, store the same number of bits so long as this compiler defines them as at least the size of the address that this compiler uses for a pointer, then this will work just fine with int or unsigned int. Int feels a little wrong, unsigned int feels right, but those are human emotions. The hardware doesn't care, so long as the bits dont change on their way to the address bus. Now if for some reason the code we don't see prints this variable as a decimal for example printf("%d\n",address); (why would you printf on a microcontroller?) then it may look strange to humans but will still be the right decimal interpretation of the bit pattern than is the address. printf("0x%X\n",address); would make more sense and be more generic. if your printf supports it you could just printf("%p",address); using Dragan's uint8_t * address declaration, which is what many folks here are probably thinking based on classical C training. vs bits are bits and have no meaning whatsoever to the hardware until used, and only for that use case, an address is only an address on the address bus, when doing math on it to compute another address it is not an address it is a bit pattern being fed into the alu, signed or unsigned might depend on the operation (add and subtract dont know signed from unsigned, multiply and divide do).
If you choose to not to use uint8_t * address as a declaration, then unsigned int "feels" better, less likely to mess you up (if you have enough bits in an (unsigned) int for that compiler to store an address in the first place). A signed int feels a little wrong, but technically should work. My rule is only use signed when you specifically need signed, otherwise use unsigned everywhere else, saves on a lot of bugs. Unfortunately traditionally C libraries do it the other way around, making a big mess before the stdint.h stuff came about.

Related

Don't understand conversion from integer to smaller pointer error

I am working on an 8051 platform which has a 16 bit pointer width.
I have a common code module for handling flash emulation and there's a function that returns the 16 bit start address of a page:
volatile u16_t start_address = find_start_address_of_page( page );
I think want to pass this 'address' to a CRC function that wants a u8_t* as a parameter so I cast it in the function call like so:
(u8_t *)start_address
This generates the warning
Warning[Pe1053]: conversion from integer to smaller pointer
Which confuses me a bit, because a u8_t* is 16 bits wide, and my variable is a 16 bit variable.. Is it simply that the compiler is warning about an "integer to pointer" conversion in general?
The code works fine, I just want to be sure I'm not missing something silly here...
You write that your 8051 platform has a 16 bit pointer width.
As far as I know, the 8051 has different address ranges for
- the internal RAM in the processor (max 256 Byte)
- external RAM (max 64k)
- Program Memory (max 64k)
The compiler I have worked with (Keil) therefore had at least four different pointer types.
An 8 bit wide 'data' pointer for the internal RAM.
A 16 bit wide 'xdata' pointer for the external RAM.
A 16 bit wide 'code' pointer for the Program Memory.
A 24 bit wide universal pointer which could be set to point to any of the three memory types. The first byte was used to select the memory type.
The warning text could mean that the compiler wants to convert your 16 bit value to an address in internal RAM which is only 8 bits wide.
If you want to silence your warning you could use a union to move your information into a different type, i.e.
union {
u16_t origType;
u8_t *newtype;
} u;
u.origType = start_address;
Assuming they are the same size you can then pass u.newtype into you function.
Since start_address is a variable that holds a memory address, you should declare and use it as such, meaning a pointer:
volatile u16_t *start_address = find_start_address_of_page( page );
Of course this also means that your function find_start_address_of_page(); has to return a pointer.
By the way, the fact that int and int * are both 16 bits wide (on your processor) is not enough. For example, a pointer to an int on most (all?) 16 bit processors has to be aligned to an even (multiple of 2) address, because of limitations of the assembler instructions and/or the data bus implementation.
In the same way, things like start_address++; increments differently depending if it is an int or an int * (or even a char *). If it is an int (or a char *) it will increment by one, but if it is an int * it will increment by two.
By this I'm trying to show that the compiler makes a lot of checks beyond the number of bits, depending on the type of variable (and the processor abilities).
start_address is of type u16_t, and not a pointer.
if you want to pass his address to CRC, then try this:
((u8_t *)((u16_t *)start_address))

Does casting remove endian dependency in C/C++?

i.e. if we cast a C or C++ unsigned char array named arr as (unsigned short*)arr and then assign to it, is the result the same independent of machine endianness?
Side note - I saw the discussion on IBM and elsewhere on SO with example:
unsigned char endian[2] = {1, 0};
short x;
x = *(short *) endian;
...stating that the value of x will depend on the layout of endian, and hence the endianness of the machine. That means dereferencing an array is endian-dependent, but what about assigning to it?
*(short*) endian = 1;
Are all future short-casted dereferences then guaranteed to return 1, regardless of endianness?
After reading the responses, I wanted to post some context:
In this struct
struct pix {
unsigned char r;
unsigned char g;
unsigned char b;
unsigned char a;
unsigned char y[2];
};
replacing unsigned char y[2] with unsigned short y makes no individual difference, but if I make an array of these structs and put that in another struct, then I've noticed that the size of the container struct tends to be higher for the "unsigned short" version, so, since I intend to make a large array, I went with unsigned char[2] to save space overhead. I'm not sure why, but I imagine it's easier to align the uchar[2] in memory.
Because I need to do a ton of math with that variable y, which is meant to be a single short-length numerical value, I find myself casting to short a lot just to avoid individually accessing the uchar bytes... sort of a fast way to avoid ugly byte-specific math, but then I thought about endianness and whether my math would still be correct if I just cast everything like
*(unsigned short*)this->operator()(x0, y0).y = (ySum >> 2) & 0xFFFF;
...which is a line from a program that averages 4-adjacent-neighbors in a 2-D array, but the point is that I have a bunch of these operations that need to act on the uchar[2] field as a single short, and I'm trying to find the lightest (i.e. without an endian-based if-else statement every time I need to access or assign), endian-independent way of working with the short.
Thanks to strict pointer aliasing it's undefined behaviour, so it might be anything. If you'd do the same with a union however the answer is no, the result is dependent on machine endianness.
Each possible value of short has a so-called "object representation"[*], which is a sequence of byte values. When an object of type short holds that value, the bytes of the object hold that sequence of values.
You can think of endianness as just being one of the ways in which the object representation is implementation-dependent: does the byte with the lowest address hold the most significant bits of the value, or the least significant?
Hopefully this answers your question. Provided you've safely written a valid object representation of 1 as a short into some memory, when you read it back from the same memory you'll get the same value again, regardless of what the object representation of 1 actually is in that implementation. And in particular regardless of endianness. But as the others say, you do have to avoid undefined behavior.
[*] Or possibly there's more than one object representation for the same value, on exotic architectures.
Yes, all future dereferences will return 1 as well: As 1 is in range of type short, it will end up in memory unmodified and won't change behind your back once it's there.
However, the code itself violates effective typing: It's illegal to access an unsigned char[2] as a short, and may raise a SIGBUS if your architecture doesn't support unaligned access and you're particularly unlucky.
However, character-wise access of any object is always legal, and a portable version of your code looks like this:
short value = 1;
unsigned char *bytes = (unsigned char *)&value;
How value is stored in memory is of course still implementation-defined, ie you can't know what the following will print without further knowledge about the architecture:
assert(sizeof value == 2); // check for size 2 shorts
printf("%i %i\n", bytes[0], bytes[1]);

When printf is an address of a variable, why use void*?

I saw some usage of (void*) in printf().
If I want to print a variable's address, can I do it like this:
int a = 19;
printf("%d", &a);
I think, &a is a's address which is just an integer, right?
Many articles I read use something like this:
printf("%p", (void*)&a);
What does %p stand for? (A pointer?)
Why use (void*)? Can't I use (int)&a instead?
Pointers are not numbers. They are often internally represented that way, but they are conceptually distinct.
void* is designed to be a generic pointer type. Any pointer value (other than a function pointer) may be converted to void* and back again without loss of information. This typically means that void* is at least as big as other pointer types.
printfs "%p" format requires an argument of type void*. That's why an int* should be cast to void* in that context. (There's no implicit conversion because it's a variadic function; there's no declared parameter, so the compiler doesn't know what to convert it to.)
Sloppy practices like printing pointers with "%d", or passing an int* to printf with a "%p" format, are things that you can probably get away with on most current systems, but they render your code non-portable. (Note that it's common on 64-bit systems for void* and int to be different sizes, so printing pointers with %d" is really non-portable, not just theoretically.)
Incidentally, the output format for "%p" is implementation-defined. Hexadecimal is common, (in upper or lower case, with or without a leading "0x" or "0X"), but it's not the only possibility. All you can count on is that, assuming a reasonable implementation, it will be a reasonable way to represent a pointer value in human-readable form (and that scanf will understand the output of printf).
The article you read is entirely correct. The correct way to print an int* value is
printf("%p", (void*)&a);
Don't take the lazy way out; it's not at all difficult to get it right.
Suggested reading: Section 4 of the comp.lang.c FAQ. (Further suggested reading: All the other sections.
EDIT:
In response to Alcott's question:
There is still one thing I don't quite understand. int a = 10; int *p = &a;, so p's value is a's address in mem, right? If right, then p's value will range from 0 to 2^32-1 (if cpu is 32-bit), and an integer is 4-byte on 32-bit OS, right? then What's the difference between the p's value and an integer? Can p's value go out of the range?
The difference is that they're of different types.
Assume a system on which int, int*, void*, and float are all 32 bits (this is typical for current 32-bit systems). Does the fact that float is 32 bits imply that its range is 0 to 232-1? Or -231 to 231-1? Certainly not; the range of float (assuming IEEE representation) is approximately -3.40282e+38 to +3.40282e+38, with widely varying resolution across the range, plus exotic values like negative zero, subnormalized numbers, denormalized numbers, infinities, and NaNs (Not-a-Number). int and float are both 32 bits, and you can take the 32 bits of a float object and treat it as an int representation, but the result won't have any straightforward relationship to the value of the float. The second low-order bit of an int, for example, has a specific meaning; it contributes 0 to the value if it's 0, and 2 to the value if it's 1; the corresponding bit of a float has a meaning, but it's quite different (it contributes a value that depends on the value of the exponent).
The situation with pointers is quite similar. A pointer value has a meaning: it's the address of some object (or any of several other things, but we'll set that aside for now). On most current systems, interpreting the bits of a pointer object as if it were an integer gives you something that makes sense on the machine level. But the language itself does not guarantee, or even hint, that that's the case.
Pointers are not numbers.
A concrete example: some years ago, I ran across some code that tried to compute the difference in bytes between two addresses by casting to integers. It was something like this:
unsigned char *p0;
unsigned char *p1;
long difference = (unsigned long)p1 - (unsigned long)p0;
If you assume that pointers are just numbers, representing addresses in a linear monolithic address space, then this code makes sense. But that assumption is not supported by the language. And in fact, there was a system on which that code was intended to run (the Cray T90) on which it simply would not have worked. The T90 had 64-bit pointers pointing to 64-bit words. Byte pointers were synthesized in software by storing an offset in the 3 high-order bits of a pointer object. Subtracting two pointers in the above manner, if they both had 0 offsets, would give you the number of words, not bytes, between the addresses. And if they had non-0 offsets, it would give you meaningless garbage. (Conversion from a pointer to an integer would just copy the bits; it could have done the work to give you a meaningful byte index, but it didn't.)
The solution was simple: drop the casts and use pointer arithmetic:
long difference = p1 - p0;
Other addressing schemes are possible. For example, an address might consist of a descriptor that (perhaps indirectly) references a block of memory, plus an offset within that block.
You can assume that addresses are just numbers, that the address space is linear and monolithic, that all pointers are the same size and have the same representation, that a pointer can be safely converted to int, or to long, and back again without loss of information. And the code you write based on those assumptions will probably work on most current systems. But it's entirely possible that some future systems will again use a different memory model, and your code will break.
If you avoid making any assumptions beyond what the language actually guarantees, your code will be far more future-proof. And even leaving portability issues aside, it will probably be cleaner.
So much insanity present here...
%p is generally the correct format specifier to use if you just want to print out a representation of the pointer. Never, ever use %d.
The length of an int and the length of a pointer (void* or otherwise) have no relationship. Most data models on i386 just happen to have 32-bit ints AND 32-bit pointers -- other platforms, including x86-64, are not the same! (This is also historically known as "all the world's a VAX syndrome".) http://en.wikipedia.org/wiki/64-bit#64-bit_data_models
If for some reason you want to hold a memory address in an integral variable, use the right types! intptr_t and uintptr_t. They're in stdint.h. See http://en.wikipedia.org/wiki/Stdint.h#Integers_wide_enough_to_hold_pointers
In C void * is an un-typed pointer. void does not mean void... it means anything. Thus casting to void * would be the same as casting to "pointer" in another language.
Using (int *)&a should work too... but the stylistic point of saying (void *) is to say -- I don't care about the type -- just that it is a pointer.
Note: It is possible for an implementation of C to cause this construct to fail and still meet the requirements of the standards. I don't know of any such implementations, but it is possible.
Although it the vast majority of C implementations store pointers to all kinds of objects using the same representation, the C Standard does not require that all implementations do so, nor does it even provide any means by which a program which would exploit commonality of representations could test whether an implementation follows the common practice and refuse to run if an implementation doesn't.
If on some particular platform, an int* held a word address, while both char* and void* combine a word address with a word that identifies a byte within a word, passing an int* to a function that is expecting to retrieve a variadic argument of type char* or void* would result in that function trying to fetch more data from the stack (a word address plus the supplemental word) than had been pushed (just the word address). This could cause the system to malfunction in unpredictable ways.
Many compilers for commonplace platforms that use the same representation for all pointers will process an action which passes a non-void pointer precisely the same way as they would process an action which casts the pointer to void* before passing it. They thus have no reason to care about whether the pointer type that is passed as a variadic argument will precisely match the pointer type expected by the recipient. Although the Standard could have specified that such implementations which would have no reason to care about pointer types should behave as though the pointers were cast to void*, the authors of C89 Standard avoided describing anything which wouldn't be common to all conforming compilers. The Standard's terminology for a construct that 99% of implementations should process identically, but 1% would might process unpredictably, is "Undefined Behavior". Implementations may, and often should, extend the semantics of the language by specifying how they will treat such constructs, but that's a Quality of Implementation issue outside the Standard's jurisdiction.

casting a pointer to integer issues warning on 64bit arch

I'm writing a linux kernel module that makes use of the exported symbol open_exec
struct file *open_exec(const char *name)
It returns a pointer, and I can check for an error with the IS_ERR macro:
if (IS_ERR(file))
return file;
During compile time, I get this warning:
warning: return makes integer from pointer without a cast
This is because my function here returns an integer. If I try to cast it:
return (int) file;
I don't get a warning on my 32bit machine, but I do on my 64bit machine:
warning: cast from pointer to integer of different size
This is because the sizeof of an int and a pointer are the same on 32bit, but they differ on a 64bit machine.
Casting it or not, the code appears to work. I'd just like to get rid of the warning.
How do I properly cast a pointer to an integer and get the value I expect, while not getting a compiler warning? The value I expect is essentially an integer listed in include/asm-generic/errno-base.h of the linux kernel code base.
Since I'm only looking at the pointer as if it was an integer in the case where IS_ERR() is true, I can be sure that it does in-fact only hold an integer value.
The PTR_ERR() macro in linux/err.h, which is where IS_ERR() is also defined, converts a pointer that's really an error code into the appropriate type (a long).
You should use something like:
if (IS_ERR(file))
return PTR_ERR(file);
Search for existing uses of PTR_ERR() in the source and you'll see this is a common pattern.
It might be appropriate for your function to return a long rather than an int - but all error codes should be representable in an int.
You can't properly cast a pointer to a type of smaller size, period. You could do some conversion if you were sure of what that pointer stored.
For example, if you know that a pointer has only lowest 32 bits set you can just cast it and use some compiler-specific pragma to suppress the warning. Or if you want to hash the pointer for using in something like a hash table you could xor the upper 32 bits with the lower 32 bits.
This can't be decided without more knowledge of how that int is used later.
Im not sure I get how you sometimes want to return an number from errno-base.h and sometimes a pointer -- how would the receiving function be able to tell the two apart? That being equal, then on Linux GCC,
int is 32bit wide irrespective of whether you are on 32 or 64bit
linux
pointers are 64 bit wide on 64 bit architectures, and 32 bite wide on
32 bit architectures
long are 32bit wide on 32bit architectures and 64 bit wide on 64 bit
architectures.
long long are always 64bit wide
hence on a 64bit architecture casting a pointer to an int means that you will case a 64bit value to a 32bit value, and you can be somewhat sure that you will lose part of the 64bit information from the pointer -- and this is what the compiler warning is all about, as you point out yourself.
If you want to cast from pointer to something 'anonymous' then your choices should be either long, long long or void* -- with the void* being the most portable.
The other alternative is to record it as an offset, that is if you have a large memory area where you want to 'cast' to a 32bit integer, then convert it to something like;
static struct mybigbuffer *globalbuffer;
int cast2int(void*x)
{
return (int)(globalbuffer-(struct mybigbuffer*)x);
}
however that is only work assuming that you know that your your memory will never exceed 2^31 records of globalbuf and that your pointers are assured to align on boundaries etc -- so unless you are 100% sure you know what you are doing, I would not recommended this either -- stick with the long or void* as the safe options.

Proper way of using simple integer and memsizes

I would like to know what is the proper way of using simple integer and memsize?To be precise,
I have a C code initially written for 32 bit architecture. Now it has to run into both the architecture, So there is obvious reason to get the following warning,while running in 64 bit architecture
warning: cast to pointer from integer of different size
I am trying to remove those warnings using the memsize, intptr_t and uintptr_t. But I have a doubt if it works properly if we use mixed simple integer and memsizes. I would like to know the proper way of using it. Following is the sample of code.
compllits = list_Cons((POINTER) predindex, compllits);
Here compllits is a linked list and is defined as pointer . list_Cons returns pointer. list_Cons is defined as:
list_Cons(POINTER x, LIST y);
And, int preindex. I am casting the integer into Pointer. As I run it in 64-bit machine , I will get the warning
: warning: cast to pointer from integer of different size
Now to resolve this warning, I am liitle bit confused in the two methods I am using ,
Method 1: changing the int preindex into intptr_t preindex.
Method 2. Keeping int preindex unchanged but doing following
compllits = list_Cons((POINTER)(intptr_t)predindex, compllits);
Both the ways are working. But I am not sure which method is legal and best?
Looking for some suggestions.
Thanks
The big question is if you really have to mix pointers and integers. (The few cases where this is the case is when handling lisp-like generic data structures.) If not, you should use the correct type, and that type only.
However, if this is the case, do you really need to handle them using the same function? For example, you could have list_Cons_pointer and list_Cons_int that accept a real pointer and an integer type matching preindexed, respectively.
Whether or not you should change the type of preindexed really depends on what it represents in the program.
Apart from this, an intptr_t is guaranteed to be large enough to hold a pointer, but it might be larger. This means that there is really no way to get rid of all warnings in all possible environments (think 48 bit pointers...)
Is preindex really a pointer? If so then your problem is using int as a pointer type. Use int *.
Also, I'd recommend using int * rather than intptr_t. intptr_t is an integer that is wide enough to hold a pointer, but semantically it is still an integer.
On a 32bit machine, int is 32 bits wide and int * is also 32 bits wide. On a 64 bit machine int is still 32 bits wide, but int * is 64 bits wide.

Resources