This question already has answers here:
size_t vs. uintptr_t
(8 answers)
Closed 1 year ago.
There was a question here about equality of sizeof(size_t) and sizeof(void*) and the accepted answer was that they are not guaranteed to be equal.
But at least, must it be that:
sizeof(void*) >= sizeof(size_t)
I think so. Because, take the largest stored object possible in a given C implementation, of size S. Now, the storage area can be thought of as array of bytes of size S. Therefore, there must be a pointer to each byte, and all these pointers are comparable and different. Therefore, the number of distinct elements of type void*, must be at least, the largest number of type size_t, which is unsigned integer type. Thus sizeof(void*) >= sizeof(size_t) .
Is my reasoning making sense or not?
Is my reasoning making sense or not?
The problem with your reeasoning is that you assume that the size of the largest object possible equals SIZE_MAX. But that's not true. If you do
void* p = malloc(SIZE_MAX);
you will (most likely) get a NULL pointer back.
You may also get warnings like:
main.cpp:48:15: warning: argument 1 value '18446744073709551615' exceeds maximum object size 9223372036854775807 [-Walloc-size-larger-than=]
48 | void* p = malloc(SIZE_MAX);
| ^~~~~~~~~~~~~~~~
Since the maximum object size isn't (always) SIZE_MAX you can't use the value of SIZE_MAX to argue about the size of pointers.
BTW: Some CPU implementation that uses 64 bit pointers at the SW level may not have 64 bit at the HW level. Instead some bits are just treated as all-ones/all-zeros.
Related
I'm trying to get a better understanding of the C standard. In particular I am interested in how pointer arithmetic might work in an implementation for an unusual machine architecture.
Suppose I have a processor with 64 bit wide registers that is connected to RAM where each address corresponds to a cell 8 bits wide. An implementation for C for this machine defines CHAR_BIT to be equal to 8. Suppose I compile and execute the following lines of code:
char *pointer = 0;
pointer = pointer + 1;
After execution, pointer is equal to 1. This gives one the impression that in general data of type char corresponds to the smallest addressable unit of memory on the machine.
Now suppose I have a processor with 12 bit wide registers that is connected to RAM where each address corresponds to a cell 4 bits wide. An implementation of C for this machine defines CHAR_BIT to be equal to 12. Suppose the same lines of code are compiled and executed for this machine. Would pointer be equal to 3?
More generally, when you increment a pointer to a char, is the address equal to CHAR_BIT divided by the width of a memory cell on the machine?
Would pointer be equal to 3?
Well, the standard doesn't say how pointers are implemented. The standard tells what is to happen when you use a pointer in a specific way but not what the value of a pointer shall be.
All we know is that adding 1 to a char pointer, will make the pointer point at the next char object - where ever that is. But nothing about pointers value.
So when you say that
pointer = pointer + 1;
will make the pointer equal 1, it's wrong. The standard doesn't say anything about that.
On most systems a char is 8 bit and pointers are (virtual) memory addresses referencing a 8 bit addressable memory loacation. On such systems incrementing a char pointer will increase the pointer value (aka memory address) by 1. However, on - unusual architectures - there is no way to tell.
But if you have a system where each memory address references 4 bits and a char is 12 bits, it seems a good guess that ++pointer will increase the pointer by three.
Pointers are incremented by the minimum of they width of the datatype they "point to", but are not guaranteed to increment to that size exactly.
For memory alignment purposes, there are many times where a pointer might increment to the next memory word alignment past the minimum width.
So, in general, you cannot assume this pointer to be equal to 3. It very well may be 3, 4, or some larger number.
Here is an example.
struct char_three {
char a;
char b;
char c;
};
struct char_three* my_pointer = 0;
my_pointer++;
/* I'd be shocked if my_pointer was now 3 */
Memory alignment is machine specific. One cannot generalize about it, except that most machines define a WORD as the first address that can be aligned to a memory fetch on the bus. Some machines can specify addresses that don't align with the bus fetches. In such a case, selecting two bytes that span the alignment may result in loading two WORDS.
Most systems don't accept WORD loads on non-aligned boundaries without complaining. This means that a bit of boiler plate assembly is applied to translate the fetch to the proceeding WORD boundary, if maximum density is desired.
Most compilers prefer speed to maximum density of data, so they align their structured data to take advantage of WORD boundaries, avoiding the extra calculations. This means that in many cases, data that is not carefully aligned might contain "holes" of bytes that are not used.
If you are interested in details of the above summary, you can read up on Data Structure Alignment which will discuss alignment (and as a consequence) padding.
char *pointer = 0;
After execution, pointer is equal to 1
Not necessarily.
This special case gives you a null pointer, since 0 is a null pointer constant. Strictly speaking, such a pointer is not supposed to point at a valid object. If you look at the actual address stored in the pointer, it could be anything.
Null pointers aside, the C language expects you to do pointer arithmetic by first pointing at an array. Or in case of char, you can also point at a chunk of generic data such as a struct. Everything else, like your example, is undefined behavior.
An implementation of C for this machine defines CHAR_BIT to be equal to 12
The C standard defines char to be equal to a byte, so your example is a bit weird and contradicting. Pointer arithmetic will always increase the pointer to point at the next object in the array. The standard doesn't really speak of representation of addresses at all, but your fictional example that would sensibly increase the address by 12 bits, because that's the size of a char.
Fictional computers are quite meaningless to discuss even from a learning point-of-view. I'd advise to focus on real-world computers instead.
When you increment a pointer to a char, is the address equal to CHAR_BIT divided by the width of a memory cell on the machine?
On a "conventional" machine -- indeed on the vast majority of machines where C runs -- CHAR_BIT simply is the width of a memory cell on the machine, so the answer to the question is vacuously "yes" (since CHAR_BIT / CHAR_BIT is 1.).
A machine with memory cells smaller than CHAR_BIT would be very, very strange -- arguably incompatible with C's definition.
C's definition says that:
sizeof(char) is exactly 1.
CHAR_BIT, the number of bits in a char, is at least 8. That is, as far as C is concerned, a byte may not be smaller than 8 bits. (It may be larger, and this is a surprise to many people, but it does not concern us here.)
There is a strong suggestion (if not an explicit requirement) that char (or "byte") is the machine's "minimum addressable unit" or some such.
So for a machine that can address 4 bits at a time, we would have to pick unnatural values for sizeof(char) and CHAR_BIT (which would otherwise probably want to be 2 and 4, respectively), and we would have to ignore the suggestion that type char is the machine's minimum addressable unit.
C does not mandate the internal representation (the bit pattern) of a pointer. The closest a portable C program can get to doing anything with the internal representation of a pointer value is to print it out using %p -- and that's explicitly defined to be implementation-defined.
So I think the only way to implement C on a "4 bit" machine would involve having the code
char a[10];
char *p = a;
p++;
generate instructions which actually incremented the address behind p by 2.
It would then be an interesting question whether %p should print the actual, raw pointer value, or the value divided by 2.
It would also be lots of fun to watch the ensuing fireworks as too-clever programmers on such a machine used type punning techniques to get their hands on the internal value of pointers so that they could increment them by actually 1 -- not the 2 that "proper" additions of 1 would always generate -- such that they could amaze their friends by accessing the odd nybble of a byte, or confound the regulars on SO by asking questions about it. "I just incremented a char pointer by 1. Why is %p showing a value that's 2 greater?"
Seems like the confusion in this question comes from the fact that the word "byte" in the C standard doesn't have the typical definition (which is 8 bits). Specifically, the word "byte" in the C standard means a collection of bits, where the number of bits is specified by the implementation-defined constant CHAR_BITS. Furthermore, a "byte" as defined by the C standard is the smallest addressable object that a C program can access.
This leaves open the question as to whether there is a one-to-one correspondence between the C definition of "addressable", and the hardware's definition of "addressable". In other words, is it possible that the hardware can address objects that are smaller than a "byte"? If (as in the OP) a "byte" occupies 3 addresses, then that implies that "byte" accesses have an alignment restriction. Which is to say that 3 and 6 are valid "byte" addresses, but 4 and 5 are not. This is prohibited by section 6.2.8 which discusses the alignment of objects.
Which means that the architecture proposed by the OP is not supported by the C specification. In particular, an implementation may not have pointers that point to 4-bit objects when CHAR_BIT is equal to 12.
Here are the relevant sections from the C standard:
§3.6 The definition of "byte" as used in the standard
[A byte is an] addressable unit of data storage large enough to hold
any member of the basic character set of the execution environment.
NOTE 1 It is possible to express the address of each individual byte
of an object uniquely.
NOTE 2 A byte is composed of a contiguous sequence of bits, the number
of which is implementation-defined. The least significant bit is
called the low-order bit; the most significant bit is called the
high-order bit.
§5.2.4.2.1 describes CHAR_BIT as the
number of bits for smallest object that is not a bit-field (byte)
§6.2.6.1 restricts all objects that are larger than a char to be a multiple of CHAR_BIT bits:
[...]
Except for bit-fields, objects are composed of contiguous sequences of
one or more bytes, the number, order, and encoding of which are either
explicitly specified or implementation-defined.
[...] Values stored in non-bit-field objects of any other object type
consist of n × CHAR_BIT bits, where n is the size of an object of that
type, in bytes.
§6.2.8 restricts the alignment of objects
Complete object types have alignment requirements which place
restrictions on the addresses at which objects of that type may be
allocated. An alignment is an implementation-defined integer value
representing the number of bytes between successive addresses at which
a given object can be allocated.
Valid alignments include only those values returned by an _Alignof
expression for fundamental types, plus an additional
implementation-defined set of values, which may be empty. Every
valid alignment value shall be a nonnegative integral power of two.
§6.5.3.2 specifies the sizeof a char, and hence a "byte"
When sizeof is applied to an operand that has type char, unsigned
char, or signed char, (or a qualified version thereof) the result is
1.
The following code fragment demonstrates an invariant of C pointer arithmetic -- no matter what CHAR_BIT is, no matter what the hardware least addressable unit is, and no matter what the actual bit representation of pointers is,
#include <assert.h>
int main(void)
{
T x[2]; // for any object type T whatsoever
assert(&x[1] - &x[0] == 1); // must be true
}
And since sizeof(char) == 1 by definition, this also means that
#include <assert.h>
int main(void)
{
T x[2]; // again for any object type T whatsoever
char *p = (char *)&x[0];
char *q = (char *)&x[1];
assert(q - p == sizeof(T)); // must be true
}
However, if you convert to integers before performing the subtraction, the invariant evaporates:
#include <assert.h>
#include <inttypes.h>
int main(void);
{
T x[2];
uintptr_t p = (uintptr_t)&x[0];
uintptr_t q = (uintptr_t)&x[1];
assert(q - p == sizeof(T)); // implementation-defined whether true
}
because the transformation performed by converting a pointer to an integer of the same size, or vice versa, is implementation-defined. I think it's required to be bijective, but I could be wrong about that, and it is definitely not required to preserve any of the above invariants.
Is there any limit as to how many elements a 2D integer array can contain in C?
PS : I was expecting there would be some space limitations in declaring an array but could not find any such reference in the internet.
It depends on your RAM or the memory available for you.
i:e: My program used to crash when I declared a global array a[100000][10000], but this declaration is fine with the system now I have.
The size_t type is defined to be large enough to contain the size of any object in the program, including arrays. So the largest possible array size can be described as 2^(8*sizeof(size_t) bytes.
For convenience, this value can be obtained through the SIZE_MAX constant in limits.h. It is guaranteed to be at least 65535 but is realistically a larger value, most likely 2^32 on 32 bit systems and 2^64 on 64 bit systems.
Maximum imposed by the C/C++ standard: x * y * z <= SIZE_MAX, where SIZE_MAX is implementation defined, x is one dimension of the array, y is the other dimension, and z is the size of the element in bytes. e.g. element_t A[x][y], z = sizeof(element_t).
The C99 standard suggests that the type size_t is large enough to store the size of any object, as it is the resulting type of the sizeof operator.
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. ...
The value of the result ... is implementation-defined, and its type (an unsigned integer type) is size_t, defined in (and other headers).
Since SIZE_MAX (<limits.h>, if I recall correctly) is defined to be the largest value that a size_t can store, it should follow that the largest object would have a size equal to SIZE_MAX. That would in fact be helpful, but alas it seems as though we'd be asking quite a lot to allocate anything even one quarter that size.
Are there any implementations where you can actually declare (or otherwise allocate) an object as large as SIZE_MAX?
It certainly doesn't seem to be the common case... In C11 the optional rsize_t type and its corresponding RSIZE_MAX macro were introduced. It's supposed to be a runtime constraint if any standard C function is used with a value greater than RSIZE_MAX as an rsize_t argument. This seems to imply that the largest object might be RSIZE_MAX bytes. However, this doesn't seem to be widely supported, either!
Are there any implementations where RSIZE_MAX exists and you can actually declare (or otherwise allocate) an object as large as RSIZE_MAX?
I think all C implementations will allow you to declare an object of that size. The OS may refuse to load the executable, as there isn't enough memory.
Likewise, all C runtime libraries will allow you to attempt to allocate that size of memory, however, it will probably fail as there isn't that much memory, neither virtual nor real.
Just think: if size_t is a type equal to the machine word size (32 bits, 64 bits), then the highest adressable memory cell (byte) is 2^32 (or 2^64). Given there are lower memory interrupt vectors, BIOS and OS, not to mention the code and data of your program, there is never this amount of memory available.
Oh, now I see the issue. Assuming realloc factor of ~1.5, there are two integer limits. First is SIZE_MAX/3, because above that size*3/2 will overflow. But at that large low bits are insignificant and we can inverse the operator order to size/2*3 to still grow by ~1.5 (that imposes second limit of SIZE_MAX/3*2). At last, resort to SIZE_MAX.
After that, we just have to binary-search the amount that system can actually allocate (in range from the result of growing to minimal required size).
int
grow(char **data_p, size_t *size_p, size_t min)
{
size_t size = *size_p;
while (size < min)
size = (size <= SIZE_MAX / 3 ? size * 3 / 2 :
size <= SIZE_MAX / 3 * 2 ? size / 2 * 3 : SIZE_MAX);
if (size != *size_p) {
size_t ext = size - min;
char *data;
for (;; ext /= 2)
if ((data = realloc(*data_p, size + ext)) || ext == 0)
break;
if (data == NULL) return -1; // ENOMEM
*data_p = data;
*size_p = size + ext;
}
return 0;
}
And none of os-dependent or manual limits!
As you can see, the original question is a consequence of [probably] imperfect implementation that doesn't treat edge cases with respect. It doesn't matter if any questioned systems do or do not exist now – any algorithm that is supposed to work near to the integer limits should take proper care of them.
(Please note that above code assumes char data and has no accounting for other element sizes; implementing that may add more complex checks.)
Suppose I'm given a function with the following signature:
void SendBytesAsync(unsigned char* data, T length)
and I need a buffer large enough to hold a byte array of the maximum length that can be specified by type T. How do I declare that buffer? I can't just use sizeof as it will return the size (in bytes) of type T and not the maximum value that the type could contain. I don't want to use limits.h as the underlying type could change and my buffer be too small. I can't use pow from math.h because I need a constant expression. So how do I get a constant expression for the maximum size of a type at compile time in C?
Edit
The type will be unsigned. Since everyone seems to be appalled at the idea of a statically allocated buffer determined at compile time, I'll provide a little background. This is for an embedded application (on a microcontroller) where reliability and speed are the priorities. As such, I'm perfectly OK with wasting statically assigned memory for the sake of run time integrity (no malloc issues) and performance (no overhead for memory allocation each time I need the buffer). I understand the risk that if the max size of T is too large my linker will not be able to allocate a buffer that big, but that will be a compile-time failure, which can be accommodated, rather than a run-time failure, which cannot be tolerated. If, for example I use size_t for the size of the payload and allocate the memory dynamically, there is a very real possibility that the system will not have that much memory available. I would much rather know this at compile time, than at run-time where this will result in packet loss, data corruption, etc. Looking at the function signature I provided, it is ridiculous to provide a type as a size parameter for a dynamically allocated buffer and not expect the possibility that a caller will use the max value of the type. So I'm not sure why there seems to be so much consternation about allocating that memory once, for good. I can see this being a huge problem in the Windows world where multiple processes are fighting for the same memory resources, but in the embedded world, there's only 1 task to be done and if you can't do that effectively, then it doesn't matter how much memory you saved.
Use _Generic:
#define MAX_SIZE(X) _Generic((X),
long: LONG_MAX,
unsigned long: ULONG_MAX,
/* ... */)
Prior to C11 there isn't a portable way to find an exact maximum value of an object of type T (all calculations with CHAR_BIT, for example, may yield overestimates due to padding bits).
Edit: Do note that under certain conditions (think segmented memory of real-life situations) you might not be able to allocate a buffer large enough to equal the maximum value of any given type T.
if T is unsigned, then would ((T) -1) work?
(This is probably really bad, and if so, please let me know why :-) )
Is there a reason why you are allocating the maximum possible buffer size instead of a buffer that is only as large as you need? Why not have the caller simply specify the amount of memory needed?
Recall that the malloc() function takes an argument of type size_t. That means that (size_t)(-1) (which is SIZE_MAX in C99 and later) will represent the largest value that can be passed to malloc. If you are using malloc as your allocator, then this will be your absolute upper limit.
Maybe try using a bit shift?
let's see:
unsigned long max_size = (1 << (8 * sizeof(T))) - 1
sizeof(T) gives you the number of bytes T occupies in memory. (not technically true. usually the compiler will align the structure with memory... so if T is one byte, it will actually allocate 4, or something.)
Breaking it down:
8 * sizeof(T) gives you the number of bits that size represents
1 << x is the same as saying 2 to the x power. Because every time you shift to the left, you're multiplying by two. Just as every time you shift to the left in base 10, you are multiplying by 10.
- 1 an 8-bit number can hold 256 values. 0..255.
Interesting question. I would start by looking in the 'limits' header for the max value of a numeric type T. I have not tried it but I would do something that uses T::max
Say we have the following code:
int main(){
int a[3]={1,2,3};
printf(" E: 0x%x\n", a);
printf(" &E[2]: 0x%x\n", &a[2]);
printf("&E[2]-E: 0x%x\n", &a[2] - a);
return 1;
}
When compiled and run the results are follows:
E: 0xbf8231f8
&E[2]: 0xbf823200
&E[2]-E: 0x2
I understand the result of &E[2] which is 8 plus the array's address, since indexed by 2 and of type int (4 bytes on my 32-bit system), but I can't figure out why the last line is 2 instead of 8?
In addition, what type of the last line should be - an integer or an integer pointer?
I wonder if it is the C type system (kinda casting) that make this quirk?
You have to remember what the expression a[2] really means. It is exactly equivalent to *(a+2). So much so, that it is perfectly legal to write 2[a] instead, with identical effect.
For that to work and make sense, pointer arithmetic takes into account the type of the thing pointed at. But that is taken care of behind the scenes. You get to simply use natural offsets into your arrays, and all the details just work out.
The same logic applies to pointer differences, which explains your result of 2.
Under the hood, in your example the index is multiplied by sizeof(int) to get a byte offset which is added to the base address of the array. You expose that detail in your two prints of the addresses.
When subtracting pointers of the same type the result is number of elements and not number of bytes. This is by design so that you can easily index arrays of any type. If you want number of bytes - cast the addresses to char*.
When you increment the pointer by 1 (p+1) then pointer would points to next valid address by adding ( p + sizeof(Type)) bytes to p. (if Type is int then p+sizeof(int))
Similar logic holds good for p-1 also ( of course subtract in this case).
If you just apply those principles here:
In simple terms:
a[2] can be represented as (a+2)
a[2]-a ==> (a+2) - (a) ==> 2
So, behind the scene,
a[2] - a[0]
==> {(a+ (2* sizeof(int)) ) - (a+0) } / sizeof(int)
==> 2 * sizeof(int) / sizeof(int) ==> 2
The line &E[2]-2 is doing pointer subtraction, not integer subtraction. Pointer subtraction (when both pointers point to data of the same type) returns the difference of the addresses in divided by the size of the type they point to. The return value is an int.
To answer your "update" question, once again pointer arithmetic (this time pointer addition) is being performed. It's done this way in C to make it easier to "index" a chunk of contiguous data pointed to by the pointer.
You may be interested in Pointer Arithmetic In C question and answers.
basically, + and - operators take element size into account when used on pointers.
When adding and subtracting pointers in C, you use the size of the data type rather than absolute addresses.
If you have an int pointer and add the number 2 to it, it will advance 2 * sizeof(int). In the same manner, if you subtract two int pointers, you will get the result in units of sizeof(int) rather than the difference of the absolute addresses.
(Having pointers using the size of the data type is quite convenient, so that you for example can simply use p++ instead of having to specify the size of the type every time: p+=sizeof(int).)
Re: "In addtion,what type of the last line should be?An integer,or a integer pointer??"
an integer/number. by the same token that the: Today - April 1 = number. not date
If you want to see the byte difference, you'll have to a type that is 1 byte in size, like this:
printf("&E[2]-E:\t0x%x\n",(char*)(&a[2])-(char*)(&a[0]))