Byte precision pointer arithmetic in C when sizeof(char) != 1

Byte precision pointer arithmetic in C when sizeof(char) != 1 - c

How can one portably perform pointer arithmetic with single byte precision?
Keep in mind that:
char is not 1 byte on all platforms
sizeof(void) == 1 is only available as an extension in GCC
While some platforms may have pointer deref pointer alignment restrictions, arithmetic may still require a finer granularity than the size of the smallest fundamental POD type

Your assumption is flawed - sizeof(char) is defined to be 1 everywhere.
From the C99 standard (TC3), in section 6.5.3.4 ("The sizeof operator"):
(paragraph 2)
The sizeof operator yields the size
(in bytes) of its operand, which may
be an expression or the
parenthesized name of a type.
(paragraph 3)
When applied to an operand that has
type char, unsigned char, or signed
char, (or a qualified version
thereof) the result is 1.
When these are taken together, it becomes clear that in C, whatever size a char is, that size is a "byte" (even if that's more than 8 bits, on some given platform).
A char is therefore the smallest addressable type. If you need to address in units smaller than a char, your only choice is to read a char at a time and use bitwise operators to mask out the parts of the char that you want.

sizeof(char) always returns 1, in both C and C++. A char is always one byte long.

According to the standard char is the smallest addressable chunk of data. You just can't address with greater precision - you would need to do packing/unpacking manually.

sizeof(char) is guaranteed to be 1 by the C standard. Even if char uses 9 bits or more.
So you can do:
type *pt;
unsigned char *pc = (unsigned char *)pt;
And use pc for arithmetic. Assigning pc to pt by using the cast above is undefined behavior by the C standard though.
If char is more than 8-bits wide, you can't do byte-precision pointer arithmetic in portable (ANSI/ISO) C. Here, by byte, I mean 8 bits. This is because the fundamental type itself is bigger than 8 bits.

Cast the pointer to a uintptr_t. This will be an unsigned integer that is the size of a pointer. Now do your arithmetic on it, then cast the result back to a pointer of the type you want to dereference.
(Note that intptr_t is signed, which is usually NOT what you want! It's safer to stick to uintptr_t unless you have a good reason not to!)

I don't understand what you are trying to say with sizeof(void) being 1 in GCC. While type char might theoretically consist of more than 1 underlying machine byte, in C language sizeof(char) is 1 and always exactly 1. In other words, from the point of view of C language, char is always 1 "byte" (C-byte, not machine byte). Once you understand that, you'd also understand that sizeof(void) being 1 in GCC does not help you in any way. In GCC the pointer arithmetic on void * pointers works in exactly the same way as pointer arithmetic on char * pointers, which means that if on some platform char * doesn't work for you, then void * won't work for you either.
If on some platform char objects consist of multiple machine bytes, the only way to access smaller units of memory than a full char object would be to use bitwise operations to "extract" and "modify" the required portions of a complete char object. C language offers no way to directly address anything smaller than char. Once again char is always a C-byte.

The C99 standard defines the uint8_t that is one byte long. If the compiler doesn't support this type, you could define it using a typedef. Of course you would need a different definition, depending on the the platform and/or compiler. Bundle everything in a header file and use it everywhere.

Related

C - Why cast to uintptr_t vs char* when doing pointer arithmetic

I am working on a programm where I have to modify the target process memory/ read it.
So far I am using void* for storing adresses and cast those to char* if I need to change them (add offset or modify in general)
I have heard of that type defined in stdint.h but I don't see the difference in using it for pointer arithmetic over the char* conversion (which seems more C89 friendly atleast to me)
So my question: What of those both methods should I use for pointer arithmetic? Should I even consider using uintptr_t over char* in any case?
EDIT 1
Basically I just need to know if this yields
0x00F00BAA hard coded memory adress in target
process
void* x = (void*)0x00F00BAA;
char* y = (void*)0x00F00BAA;
x = (uintptr_t)x + 0x123;
y = (char*)y + 0x123;
x == y?
x == (void*)0x00F00CCD?
y == (void*)0x00F00CCD?

In comments user R.. points out that the following is likely incorrect if the addresses the code is dealing with are not valid within the current process. I've asked the OP for clarification.
Do not use uintptr_t for pointer arithmetic if you care about the portability of your code. uintptr_t is an integer type. Any arithmetic operations on it are integer arithmetic, not pointer arithmetic.
If you have a void* value and you want to add a byte offset to it, casting to char* is the correct approach.
It's likely that arithmetic on uintptr_t values will work the same way as char* arithmetic, but it absolutely is not guaranteed. The only guarantee that the C standard provides is that you can convert a void* value to uintptr_t and back again, and the result will compare equal to the original pointer value.
And the standard doesn't guarantee that uintptr_t exists. If there is no integer type wide enough to hold a converted pointer value without loss of information, the implementation just won't define uintptr_t.
I've actually worked on systems (Cray vector machines) where arithmetic on uintptr_t wouldn't necessarily work. The hardware had 64-bit words, with a machine address containing the address of a word. The Unix-like OS needed to support 8-bit bytes, so byte pointers (void*, char*) contained a word address with a 3-bit offset stored in the otherwise unused high-order 3 bits of the 64-bit word. Pointer/integer conversions simply copied the representation. The result was that adding 1 to a char* pointer would cause it to point to the next byte (with the offset handled in software), but converting to uintptr_t and adding 1 would cause it to point to the next word.
Bottom line: If you need pointer arithmetic, use pointer arithmetic. That's what it's for.
(Incidentally, gcc has an extension that permits pointer arithmetic on void*. Don't use it in portable code. It also causes some odd side effects, like sizeof (void) == 1.)

C Guarantees Re: Pointers to Void and Character Types, Typecasting

My best-effort reading of the C specification (C99, primarily) makes me think that it is valid to cast (or implicitly convert, where void *'s implicit conversion behavior applies), between any of these types:
void *, char *, signed char *, unsigned char *
I expect that this will trigger no undefined behavior, and that those pointers are guaranteed to have the same underlying representation.
Consequently, it should be possible to take a pointer of either one of those four types that is already pointing to an address which can be legally dereferenced, typecast and/or assign it to one of the three char type pointers, and dereference it to access the same memory, with the only difference being whether your code will treat the data at that location as a char, signed char, or unsigned char.
Is this correct? Is there any version of the C standard (lack of void * type in pre-standardization C not withstanding) where this is not true?
P.S. I believe that this question is answered piecemeal in passing in a lot of other questions, but I've never seen a single clear answer where this is explicitly stated/confirmed.

Consequently, it should be possible to take a pointer of either one of those four types that is already pointing to an address which can be legally dereferenced, typecast and/or assign it to one of the three char type pointers, and dereference it to access the same memory, with the only difference being whether your code will treat the data at that location as a char, signed char, or unsigned char.
This is correct. In fact you could take a valid pointer to an object of any type and convert it to some of those three and access the memory.
You correctly mention the provision about void * and char * etc. having the same representation and alignment requirements, but that actually does not matter. That refers to the properties of the pointer itself, not the properties of the objects being pointed to.
The strict aliasing rule is not violated because that contains an explicit provision that a character type may be used to read or write any object.
Note that if we have for example, signed char ch = -2;, or any other negative value, then (unsigned char)ch may differ from *(unsigned char *)&ch. On a system with 8-bit characters, the former is guaranteed to be 254 but the latter could be 254, 253, or 130 depending on the numbering system in use.

Casting a void pointer to char* in order to do pointer arithmetic

In this video, taken from Stanford's CS107 lecture, the professor seems to state that casting a void* to a char* will do the same thing in terms of arithmetic as casting it to an unsigned long.
http://www.youtube.com/watch?v=_eR4rxnM7Lc&t=44m30s
The part in question goes from 44:30 to around 46:00
He says they are "both 4-byte figures"
I understand casting the void* to a char*, because it will assume arithmetic is sizeof(char) = 1.
But I don't get how you could do the same thing by casting it to an unsigned long* because the arithmetic will be in units of 4. What am I missing?

He says they are "both 4-byte figures"
This may well be true on a particular platform, but neither is guaranteed to be the case in general.
But I don't get how you could do the same thing by casting it to an unsigned long* because the arithmetic will be in units of 4. What am I missing?
He is not casting to unsigned long*, he is casting to unsigned long.

The statement may be true on that professor's particular machine on a particular Tuesday of last year, but in general, it's wrong. If char * and unsigned long could be treated the same, C wouldn't need two distinct types.
What the professor probably wanted to say is the following rule:
For a variable of any pointer type (except void pointers), the following holds: p + 1 == (T*) (((char*)p) + sizeof(*p)) (where T is the type of *p), i.e. adding 1 to a pointer increases it by the size of the type that it points to.
Since sizeof(char) == 1, x+1 will have the same value if x is of type char * or unsigned long, given that sizeof(char *) == sizeof(unsigned long), which is not to assume unless "you know what you're doing".
Note that the actual representation might differ for various reasons, most notably since unsigned long may have padding bits anywhere in its representation.

Is sizeof(int) guaranteed to equal sizeof(void*)

Is the size of the datatype "int" always equals to the size of a pointer in the c language?
I'm just curious.

Not at all, there is no guarantee that sizeof(int) == sizeof(void*). And on Linux/AMD64 sizeof(int) is 4 bytes, and sizeof(void*) is 8 bytes (same as sizeof(long) on that platform).
Recent C standard (e.g. C99) defines a standard header <stdint.h> which should define, among others, an integral type intptr_t which is guaranteed to have the size of pointers (and probably even which is reversably castable to and from pointers).
I think that the standard does not guarantee that all pointers have the same size, in particular pointer to functions can be "bigger" than data pointers (I cannot name a platform where it is true). I believe that recent Posix standard requires that (e.g. for dlsym(3)).
See also this C reference and the n1570 draft C11 standard (or better)
PS. In 2021 I cannot name a common platform with sizeof(long) != sizeof(void*). But in the previous century the old intel 286 could have been such a platform.

No. for example, in most 64bit systems, int is 4 bytes, and void* is 8.

It is not guaranteed.
And for example, in most 64-bit systems both sizes are usually different.
Even sizeof (int *) is not guranteed to be equal to sizeof (void *).
The only guarantee for void * size is
sizeof (void *) == sizeof (char *)
== sizeof (signed char *) == sizeof (unsigned char *)

No. Some (mostly older, VAX-era) code assumes this, but it's definitely not required, and assuming it is not portable. There are real implementations where the two differ (e.g., some current 64-bit environments use a 64-bit pointer and 32-bit int).

The C languages gives no guarantees of anything when it comes to integer or pointer sizes.
The size of int is typically the same as the data bus width, but not necessarily. The size of a pointer is typically the same as the address bus width, but not necessarily.
Many compilers use non-standard extensions like the far keyword, to access data beyond the width of the default pointer type.
In addition to 64-bit systems, there are also plenty of microcontroller/microprocessor architectures where the size of int and the size of a pointer are different. Windows 3.1 and DOS are other examples.

There's no guarantee of any relation between the sizes of these two types, nor that either can be faithfully represented in the other via round-trip casts. It's all implementation-defined.
With that said, in the real world, unless you're dealing with really obscure legacy 16-bit systems or odd DSPs or such, sizeof(int) is going to be less than or equal to sizeof(void *), and you can faithfully convert int values to void * to pass them to interfaces (like pthread_create) that take a generic void * argument to avoid wasteful allocation and freeing of memory to store a single int. In particular, if you're using POSIX or Windows interfaces already, this is definitely a safe real-world assumption to make.
You should never assume void * can be faithfully represented in int (i.e. casting a pointer to int and back). This does not work on any popular real-world 64-bit systems, and the percentage of systems it works on is sure to plummet in the near future.

No. Pointer types do not have to be the same size or representation as integer types. Here are a few relevant sections from the C language standard (online draft available here):
6.2.5 Types
...
27 A pointer to void shall have the same representation and alignment requirements as a
pointer to a character type.39) Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All pointers to structure types shall have the same representation and alignment requirements
as each other. All pointers to union types shall have the same representation and
alignment requirements as each other. Pointers to other types need not have the same
representation or alignment requirements.
...
39) The same representation and alignment requirements are meant to imply interchangeability as
arguments to functions, return values from functions, and members of unions.
...
6.3.2.3 Pointers
...
5 An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.56)
6 Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type.
...
56) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to
be consistent with the addressing structure of the execution environment.

No, it doesn't have to be, but it's usually the case that sizeof(long) == sizeof(void*).

sizeof(bitfield_type) legal in ANSI C?

struct foo { unsigned x:1; } f;
printf("%d\n", (int)sizeof(f.x = 1));
What is the expected output and why? Taking the size of a bitfield lvalue directly isn't allowed. But by using the assignment operator, it seems we can still take the size of a bitfield type.
What is the "size of a bitfield in bytes"? Is it the size of the storage unit holding the bitfield? Is it the number of bits taken up by the bf rounded up to the nearest byte count?
Or is the construct undefined behavior because there is nothing in the standard that answers the above questions? Multiple compilers on the same platform are giving me inconsistent results.

You are right, integer promotions aren't applied to the operand of sizeof:
The integer promotions are applied only: as part of the usual arithmetic conversions, to certain argument expressions, to the operands of the unary +, -, and ~ operators, and to both operands of the shift operators, as specified by their respective subclauses.
The real question is whether bitfields have their own types.
Joseph Myers told me:
The conclusion
from C90 DRs was that bit-fields have their own types, and from C99 DRs
was to leave whether they have their own types implementation-defined, and
GCC follows the C90 DRs and so the assignment has type int:1 and is not
promoted as an operand of sizeof.
This was discussed in Defect Report #315.
To summarize: your code is legal but implementation-defined.

The C99 Standard (PDF of latest draft) says in section 6.5.3.4 about sizeof constraints:
The sizeof operator shall not be applied to an expression that has function type or an
incomplete type, to the parenthesized name of such a type, or to an expression that
designates a bit-field member.
This means that applying sizeof to an assignment expression is allowed.
6.5.16.3 says:
The type of an assignment expression is the type of the left operand ...
6.3.1.1.2 says regarding integer promotions:
The following may be used in an expression wherever an int or unsigned int may be used:
...
A bit-field of type _Bool, int, signed int, or unsigned int.
If an int can represent all values of the original type,
the value is converted to an int;
otherwise, it is converted to an unsigned int.
So, your test program should output the size of an int, i.e.,
sizeof(int).
Is there any compiler that does not to this?

Trying to get the size of a bitfield isn't legal, as you have seen. (sizeof returns the size in bytes, which wouldn't make much sense for a bitfield.)
sizeof(f.x = 1) will return the size of the type of the expression. Since C doesn't have a real "bitfield type", the expression (here: an assignment expression) usually gets the type of the bit field's base type, in your example unsigned int, but it is possible for a compiler to use a smaller type internally (in this case probably unsigned char because it's big enough for one bit).

sizeof( f.x = 1)
returns 1 as its answer. The sizeof(1) is presumably the size of an integer on the platform you are compiling on, probably either 4 or 8 bytes.

No, you must be thinking of the == operator, which yields a "boolean" expression of type int in C and indeed bool in C++.
I think the expression will convert the value 1 to the correspondin bitfield type and assign it to the bitfield. The result should also be a bitfield type because there are no hidden promotions or conversions that I can see.
Thus we are effectively getting access to the bitfield type.
No compiler diagnostic is required because "f.x = 1" isn't an lvalue, i.e. it does not designate the bitfield directly. It's just a value of type "unsigned :1".
I'm specifically using "f.x = 1" because "sizeof f.x" takes the size of a bitfield lvalue, which is clearly not allowed.

The sizeof(1) is presumably the size of an integer on the platform you are compiling on, probably either 4 or 8 bytes.
Note that I'm NOT taking sizeof(1), which is effectively sizeof(int). Look close, I'm taking sizeof(f.x = 1), which should effectively be sizeof(bitfield_type).
I'd like to see a reference to something that tells me whether the construct is legal. As an added bonus, it would be nice if it told me what sort of result is expected.
gcc certainly disagrees with the assertion that sizeof(bitfield_type) should be the same as sizeof(int), but only on some platforms.

Trying to get the size of a bitfield isn't legal, as you have seen. (sizeof returns the size in bytes, which wouldn't make much sense for a bitfield.)
So are you stating that the behavior is undefined, i.e. it has the same degree of legality as "*(int *)0 = 0;", and compilers can choose to fail to handle this sensibly?
That's what I'm trying to find out. Do you assume that it's undefined by omission, or is there something that explicitly declares it as illegal?

The
(f.x = 1)
is not an expression, it is an assignment and thus returns the assigned value. In this case, the size of that value depends on the variable, it has been assigned to.
unsigned x:1
has 1 Bit and its sizeof returns 1 byte (8 bit alignment)
If you would use
unsigned x:12
then the sizeof(f.x = 1) would return 2 byte (again because of the 8 bit alignment)

is not an expression, it is an assignment and thus returns the assigned value. In this case, the size of that value depends on the variable, it has been assigned to.
First, it IS an expression containing the assignment operator.
Second, I'm quite aware of what's happening in my example :)
then the sizeof(f.x = 1) would return 2 byte (again because of the 8 bit alignment)
Where did you get this? Is this what happens on a particular compiler that you have tried, or are these semantics stated in the standard? Because I haven't found any such statements. I want to know whether the construct is guaranteed to work at all.

in this second example, if you would define your struct as a
struct foo { unsigned x:12} f;
and then write a value like 1 into f.x - it uses 2 Bytes because of the alignment. If you do an assignment like
f.x = 1;
and this returns the assigned value. This is quite similar to
int a, b, c;
a = b = c = 1;
where the asignment is evaluated from right to left. c = 1 assigns 1 to the variable c and this asignment returns the assigned value and assigns it to b (and so forth) until 1 is assigned to a
it is equal to
a = ( b = ( c = 1 ) )
in your case, the sizeof gets the size of your asignment, wich is NOT a bitfield, but the variable assigned to it.
sizeof ( f.x = 1)
does not return the bitfields size, but the variable assigment which is a 12 bit representation of the 1 (in my case) and therefore sizeof() returns 2 byte (because of the 8bit aligment)

Look, I understand full well what I'm doing with the assignment trick.
You are telling me that the size of a bitfield type is rounded up to the cloest byte count, which is one option I listed in the initial question. But you didn't back it up with references.
In particular, I have tried various compilers which give me sizeof(int) instead of sizeof(char) EVEN if I apply this to a bitfield with only has a single bit.
I wouldn't even mind if multiple compilers randomly get to choose their own interpretation of this construct. Certainly bitfield storage allocation is quite implementation-defined.
However, I really do want to know whether the construct is GUARANTEED to work and yield SOME value.

CL, I've seen your citations before, and agree they're totally relevant, but even after having read them I wasn't sure whether the code is defined.
6.3.1.1.2 says regarding integer promotions:
Yes, but integer promotion rules only apply if a promotion is in fact carried out. I do not think that my example requires a promotion. Likewise if you do
char ch;
sizeof ch;
... then ch also isn't promoted.
I think we are dealing directly with the bitfield type here.
I've also seen gcc output 1 while many other compilers (and even other gcc versions) don't. This doesn't convince me that the code is illegal because the size could just as well be implementation-defined enough to make the result inconsistent across multiple compilers.
However, I'm confused as to whether the code may be undefined because nothing in the standard seems to state how the sizeof bitfield case is handled.

Wouldn't
(f.x = 1)
be an expression evaluating to true (technically in evaluates to result of the assignment, which is 1/true in this case), and thus,
sizeof( f.x = 1)
is asking for the size of true in terms of how many chars it would take to store it?
I should also add that the Wikipedia article on sizeof is nice. In particular, they say "sizeof is a compile-time operator that returns the size, in multiples of the size of char, of the variable or parenthesized type-specifier that it precedes."
The article also explains that sizeof works on expressions.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight