This code run on Turbo C but not on gcc compiler? - c

This code run on Turbo C but not on gcc compiler
Error:syntax error before '*' token
#include<stdio.h>
int main()
{
char huge *near *far *ptr1;
char near *far *huge *ptr2;
char far *huge *near *ptr3;
printf("%d, %d, %d\n", sizeof(ptr1), sizeof(ptr2), sizeof(ptr3));
return 0;
}
Turbo C output is :4, 4 , 2
Can you explain the Output on Turbo C?

The qualifiers huge, far and near, are non-standard. So, while they might work in Turbo C, you can't rely on them working in other compilers (such as gcc).

Borland's C/C++ compilers for DOS supported multiple memory models.
A memory model is a way to access code and data through pointers.
Since DOS runs in the so-called real mode of the CPU, in which memory is accessed through pairs of a segment value and an offset value (each normally being 16-bit long), a memory address is naturally 4 bytes long.
But segment values need not be always specified explicitly. If everything a program needs to access is contained within one segment (a 64KB block of memory aligned on a 16-byte boundary), a single segment value is enough and once it's loaded into the CPU's segment registers (CS, SS, DS, ES), the program can access everything by only using 16-bit offsets. Btw, many .COM-type programs work exactly like that, they use only one segment.
So, there you have 2 possible ways to access memory, with an explicit segment value or without.
In these lines:
char huge *near *far *ptr1;
char near *far *huge *ptr2;
char far *huge *near *ptr3;
the modifiers far, huge and near specify the proximities of the objects that ptr1, ptr2 and ptr3 will point to. They tell the compiler that the *ptr1 and *ptr2 objects will be "far away" from the program's main/current segment(s), that is, they will be in some other segments, and therefore need to be accessed through 4-byte pointers, and the *ptr3 object is "near", within the program's own segment(s), and a 2-byte pointer is sufficient.
This explains the different pointer sizes.
Depending on the memory model that you choose for your program to compile in, function and data pointers will default to either near or far or huge and spare you from spelling them out explicitly, unless you need non-default pointers.
The program memory models are:
tiny: 1 segment for everything; near pointers
small: 1 code segment, 1 data/stack segment; near pointers
medium: multiple code segments, 1 data/stack segment; far code pointers, near data pointers
compact: 1 code segment, multiple data segments; near code pointers, far data pointers
large: multiple code and data segments; far pointers
huge: multiple code and data segments; huge pointers
Huge pointers don't have certain limitations of far pointers, but are slower to operate with.

you forgot to put a comma between variables :).
Variables cannot have same name if their scope is same.

Related

allocating from stack - data alignment issues in C

In another post, I asked a coding question and in the source code to that question, I declared some variables in the following manner:
char datablock[200];
char *pointer1=datablock;
char *pointer2=datablock+100;
However someone mentioned that the code may be incompatible with 64-bit systems because 100 isn't divisible by 8? I can't remember what it was.
But what I want to do is reserve a huge chunk of memory for use with my program and make it execute as fast as possible and I remember because of the way system caching memory works, that using data from the same block of memory is faster than using data from separate blocks. using malloc is also asking for slower memory.
So in code, This is an example of what I want to do. I want to allocate 40,000 bytes and give 4 pointers access to 10,000 bytes each:
char data[40000];
char *string0=data;
char *string1=data+10000;
char *string2=data+20000;
char *string3=data+30000;
This however is not what I want to do as I believe different sections of memory will be accessed:
char string0[10000];
char string1[10000];
char string2[10000];
char string3[10000];
I believe my idea is correct but is the only thing I need to be concerned about is that for 64-bit systems the offset value is a multiple of 8 and for 32-bit systems the offset value is a multiple of 4?
I don't want to pick wrong numbers and receive segmentation faults.
The alignment problems that may arise are related to storing something that has a specified alignment outside of its alignment rules.
This is not your case. You are not storing pointers in unaligned addresses, you are just storing addresses in aligned pointers.
Just to make it clear:
char *pointer2=datablock+100;
This declares a pointer which could be on stack or on register according to how this will be compiled but the allocation of the space for the pointer itself is given to the compiler, which will do it correctly for the underlying architecture.
The problem can arise when you do something like:
int* asInteger = (int*) (datablock+1);
*datablock = 10;
In this situation you are trying to store a value which has an alignment requirement (int) in an address which could be unaligned to the requirement of int.
In any case, if I remember correctly, x86 architecture allows it to work but it is slower.
Whether the system is 32 or 64 bit will not cause the code you mention to have a segmentation fault. In this example of pointer arithmetic :
char *pointer2=datablock+100;
you are saying : from address pointed to by datablock advance 100 times the size of a char. The number you advance doesn't have to be a multiple of any other number.
Regarding the last code snippet it's likely the 4 sections of memory will be consecutive in the stack but.
You can verify and see what's happening by printing pointer addresses. E.g.
printf("string0 %p\n", string0);
printf("string1 %p\n", string1);
printf("string2 %p\n", string2);
printf("string3 %p\n", string3);

Buffer overflow - order of local variables on stack [duplicate]

This question already has answers here:
Order of local variable allocation on the stack
(10 answers)
Closed 4 years ago.
I'm quite confused about how the local variables are ordered on the stack. I understand, that (on Intel x86) the local variables are stored from higher to lower address as they go in the code. So it's clear, that this code:
int i = 0;
char buffer[4];
strcpy(buffer, "aaaaaaaaaaaaaaa");
printf("%d", i);
produces something like this:
1633771873
The i variable was overwritten by the overflowed buffer.
However, if I swap the first two lines:
char buffer[4];
int i = 0;
strcpy(buffer, "aaaaaaaaaaaaaaa");
printf("%d", i);
the output is absolutely same.
How is it possible? The i's address is lower than the buffer's one and so an overflow of the buffer should overwrite other data, but not i. Or am I missing something?
There is no rule about the order of local variables, so the compiler is generally free to allocate them the way it likes. But on the other hand there are many strategies that a compiler will use to reduce the possibility that could happen what you are voluntarily trying to do.
One of those safety enhancement would be to allocate a buffer always far from other scalar variables because an array can be addressed out of bounds and be more incline to bloat adjacent variables. Another trick is to add some trap empty space after arrays to create a kind of isolation for the bounds problem.
Anyway you can use the debugger to have a look to the assembly for confirmation of variables positioning.
If you want to look at how the local variables are allocated by the compiler try compiling with gcc -S which will output the assembly code. On the assembly code you can see how the compiler has chosen to order the variables.
One thing to keep in mind in how the compiler chooses to order local variables is that each char only needs to align by 1 (which means that it can start at any byte of memory), on the other hand the int has to align by 4 (which means that it can only start on a byte evenly divisible by 4), so depending on the alignment the compiler has it's own logic on how to avoid having empty bytes of data which means that it often groups together variables of similar type in a certain order. So even if you define them like this:
int a;
char c;
int b;
char d;
It is likely that the compiler has grouped together the ints and chars in memory so the memory going from low memory on top to high memory on bottom might look something like:
low memory
| | | char d | char c|
| int b |
| int a |
high memory
each block of || represents one byte and an entire line represents 4 bytes.
Try messing around with the assembly code sometime it is pretty interesting.

Can an address be assigned to a variable in C?

Is it possible to assign a variable the address you want, in the memory?
I tried to do so but I am getting an error as "Lvalue required as left operand of assignment".
int main() {
int i = 10;
&i = 7200;
printf("i=%d address=%u", i, &i);
}
What is wrong with my approach?
Is there any way in C in which we can assign an address we want, to a variable?
Not directly.
You can do this though : int* i = 7200;
.. and then use i (ie. *i = 10) but you will most likely get a crash. This is only meaningful when doing low level development - device drivers, etc... with known memory addreses.
Assuming you are on an x86-type processer on a modern operating system, it is not possible to write to aribtray memory locations; the CPU works in concert with the OS to protect memory so that one process cannot accidentally (or intentionally) overwrite another processes' memory. Allowing this would be a security risk (see: buffer overflow). If you try to anyway, you get the 'Segmentation fault' error as the OS/CPU prevents you from doing this.
For technical details on this, you want to start with 1, 2, and 3.
Instead, you ask the OS to give you a memory location you can write to, using malloc. In this case, the OS kernel (which is generally the only process that is allowed to write to arbitrary memory locations) finds a free area of memory and allocates it to your process. The allocation process also marks that area of memory as belonging to your process, so that you can read it and write it.
However, a different OS/processor architecture/configuration could allow you to write to an arbitrary location. In that case, this code would work:
#include <stdio.h>
void main() {
int *ptr;
ptr = (int*)7000;
*ptr = 10;
printf("Value: %i", *ptr);
}
C language provides you with no means for "attaching" a name to a specific memory address. I.e. you cannot tell the language that a specific variable name is supposed to refer to a lvalue located at a specific address. So, the answer to your question, as stated, is "no". End of story.
Moreover, formally speaking, there's no alternative portable way to work with specific numerical addresses in C. The language itself defines no features that would help you do that.
However, a specific implementation might provide you with means to access specific addresses. In a typical implementation, converting an integral value Ato a pointer type creates a pointer that points to address A. By dereferencing such pointer you can access that memory location.
Not portably. But some compilers (usually for the embedded world) have extensions to do it.
For example on IAR compiler (here for MSP430), you can do this:
static const char version[] # 0x1000 = "v1.0";
This will put object version at memory address 0x1000.
You can do in the windows system with mingw64 setup in visual studio code tool, here is my code
#include<stdio.h>
int main()
{
int *c;
c = (int *)0x000000000061fe14; // Allocating the address 8-bit with respect to your CPU arch.
*c = NULL; // Initializing the null pointer for allocated address
*c = 0x10; // Assign a hex value (you can assign integer also without '0x')
printf("%p\n",c); // Prints the address of the c pointer variable
printf("%x\n",*c); // Prints the assigned value 0x10 -hex
}
It is tested with mentioned environment. Hope this helps Happy coding !!!
No.
Even if you could, 7200 is not a pointer (memory address), it's an int, so that wouldn't work anyway.
There's probably no way to determine which address a variable will have. But as a last hope for you, there is something called "pointer", so you can modify a value on address 7200 (although this address will probably be inaccessible):
int *i = (int *)7200;
*i = 10;
Use ldscript/linker command file. This will however, assign at link time, not run time.
Linker command file syntax depends largely on specific compiler. So you will need to google for linker command file, for your compiler.
Approximate pseudo syntax would be somewhat like this:
In linker command file:
.section start=0x1000 lenth=0x100 myVariables
In C file:
#pragma section myVariables
int myVar=10;
It's not possible, maybe possible with compiler extensions. You could however access memory at an address you want (if the address is accessible to your process):
int addr = 7200;
*((int*)addr) = newVal;
I think '&' in &a evaluates the address of i at the compile time which i think is a virtual address .So it is not a Lvalue according to your compiler. Use pointer instead

C pointers and the physical address

I'm just starting C. I have read about pointers in various books/tutorials and I understand the basics. But one thing I haven't seen explained is what are the numbers.
For example:
int main(){
int anumber = 10;
int *apointer;
apointer = &anumber;
printf("%i", &apointer);
}
may return a number like 4231168. What does this number represent? Is it some storage designation in the RAM?
Lots of PC programmer replies as always. Here is a reply from a generic programming point-of-view.
You will be quite interested in the actual numerical value of the address when doing any form of hardware-related programming. For example, you can access hardware registers in a computer in the following way:
#define MY_REGISTER (*(volatile unsigned char*)0x1234)
This code assumes you know that there is a specific hardware register located at address 0x1234. All addresses in a computer are by tradition/for convenience expressed in hexadecimal format.
In this example, the address is 16 bits long, meaning that the address bus on the computer used is 16-bits wide. Every memory cell in your computer has an address. So on a 16-bit address bus you could have a maximum of 2^16 = 65536 addressable memory cells.
On a PC for example, the address would typically be 32 bits long, giving you 4.29 billion addressable memory cells, ie 4.29 Gigabyte.
To explain that macro in detail:
0x1234 is the address of the register / memory location.
We need to access this memory location through a pointer, so therefore we typecast the integer constant 0x1234 into an unsigned char pointer = a pointer to a byte.
This assumes that the register we are interested in is 1 byte large. Had it been two bytes large, we would perhaps have used unsigned short instead.
Hardware registers may update themselves at any time (their contents are "volatile"), so the program can't be allowed to make any assumptions/optimizations of what's stored inside them. The program has to read the value from the register at every single time the register is used in the code. To enforce this behavior, we use the volatile keyword.
Finally, we want to access the register just as if it was a plain variable. Therefore the * is added, to take the contents of the pointer.
Now the specific memory location can be accessed by the program:
MY_REGISTER = 1;
unsigned char var = MY_REGISTER;
For example, code like this is used everywhere in embedded applications.
(But as already mentioned in other replies, you can't do things like this in modern PCs, since they are using something called virtual addressing, giving you a slap on the fingers should you attempt it.)
It's the address or location of the memory to which the pointer refers. However, it's best if you regard this as an opaque quantity - you are never interested in the actual value of the pointer, only that to which it refers.
How the address then relates to physical memory is a service that the system provides and actually varies across systems.
That's a virtual address of anumber variable. Every program has its own memory space and that memory space is mapped to the physical memory. The mapping id done by the processor and the service data used for that is maintained by the operating system. So your program never knows where it is in the physical memory.
It's the address of the memory1 location where your variable is stored. You shouldn't care about the exact value, you should just know that different variables have different addresses, that "contiguous memory" (e.g. arrays) has contiguous addresses, ...
By the way, to print the address stored in a pointer you should use the %p specifier in printf.
Notice that I did not say "RAM", because in most modern OSes the "memory" your process sees is virtual memory, i.e. an abstraction of the actual RAM managed by the OS.
A lot of people told you, that the numeric value of a pointer will designate its address. This is one way how implementations can do it, but it is very important, what the C standard has to say about pointers:
The nil pointer has always numeric value 0 when operated on in the C programming language. However the actual memory containing the pointer may have any value, as long as this special, architecture dependent value is consistently treated nil, and the implementation takes care that this value is seen as 0 by C source code. This is important to know, since 0 pointers may appear as a different value on certain architectures when inspected with a low level memory debugger.
There's no requirement whatsoever that the values of the pointer are in any way related to actual addresses. They may be as well abstract identifiers, resolved by a LUT or similar.
If a pointer addresses an array, the rules of pointer arithmetic must hold, i.e. int array[128]; int a, b; a = (int)&array[120]; b = (int)&array[100]; a - b == 20 ; array + (a-b) == &array[20]; &array[120] == (int*)a
Pointer arithmetic between pointers to different objects is undefined and causes undefined behaviour.
The mapping pointer to integer must be reversible, i.e. if a number corresponds to a valid pointer, the conversion to this pointer must be valid. However (pointer) arithmetic on the numerical representation of pointers to different objects is undefined.
Yes, exactly that - it's the address of the apointer data in memory. Local variable such as anumber and apointer will be allocated in your program's stack, so it will refer to an address in the main() function's frame in the stack.
If you had allocated the memory with malloc() instead it would refer to a position in your program's heap space. If it was a fixed string it may refer to a location in your program's data or rodata (read-only data) segments instead.
in this case &apointer represent the address in RAM memory of the pointer variable apointer
apointer is the "address" of the variable anumber. In theory, it could be the actual physical place in RAM where the value of anumber is stored, but in reality (on most OS'es) it's likely to be a place in virtual memory. The result is the same though.
It's a memory address, most likely to the current location in your program's stack. Contrary to David's comment, there are times when you'll calculate pointer offsets, but this is only if you have some kind of array that you are processing.
It's the address of the pointer.
"anumber" takes up some space in RAM, the data at this spot contains the number 10.
"apointer" also takes up some space in RAM, the data at this spot contains the location of "anumber" in RAM.
So, say you have 32 bytes of ram, addresses 0..31
At e.g. position 16 you have 4 bytes, the "anumber" value 10
At e.g. position 20 you have 4 bytes, the "apointer" value 16, "anumber"'s position in RAM.
What you print is 20, apointer's position in RAM.
Note that this isn't really directly in RAM, it's in virtual address space which is mapped to RAM. For the purpose of understanding pointers you can completely ignore virtual address space.
it is not the address of the variable anumber that is printed but it is the address of the pointer which gets printed.look carefully.had it been just "apointer",then we would have seen the address of the anumber variable.

Pointer implementation details in C

I would like to know architectures which violate the assumptions I've listed below. Also, I would like to know if any of the assumptions are false for all architectures (that is, if any of them are just completely wrong).
sizeof(int *) == sizeof(char *) == sizeof(void *) == sizeof(func_ptr *)
The in-memory representation of all pointers for a given architecture is the same regardless of the data type pointed to.
The in-memory representation of a pointer is the same as an integer of the same bit length as the architecture.
Multiplication and division of pointer data types are only forbidden by the compiler. NOTE: Yes, I know this is nonsensical. What I mean is - is there hardware support to forbid this incorrect usage?
All pointer values can be casted to a single integer. In other words, what architectures still make use of segments and offsets?
Incrementing a pointer is equivalent to adding sizeof(the pointed data type) to the memory address stored by the pointer. If p is an int32* then p+1 is equal to the memory address 4 bytes after p.
I'm most used to pointers being used in a contiguous, virtual memory space. For that usage, I can generally get by thinking of them as addresses on a number line. See Stack Overflow question Pointer comparison.
I can't give you concrete examples of all of these, but I'll do my best.
sizeof(int *) == sizeof(char *) == sizeof(void *) == sizeof(func_ptr *)
I don't know of any systems where I know this to be false, but consider:
Mobile devices often have some amount of read-only memory in which program code and such is stored. Read-only values (const variables) may conceivably be stored in read-only memory. And since the ROM address space may be smaller than the normal RAM address space, the pointer size may be different as well. Likewise, pointers to functions may have a different size, as they may point to this read-only memory into which the program is loaded, and which can otherwise not be modified (so your data can't be stored in it).
So I don't know of any platforms on which I've observed that the above doesn't hold, but I can imagine systems where it might be the case.
The in-memory representation of all pointers for a given architecture is the same regardless of the data type pointed to.
Think of member pointers vs regular pointers. They don't have the same representation (or size). A member pointer consists of a this pointer and an offset.
And as above, it is conceivable that some CPU's would load constant data into a separate area of memory, which used a separate pointer format.
The in-memory representation of a pointer is the same as an integer of the same bit length as the architecture.
Depends on how that bit length is defined. :)
An int on many 64-bit platforms is still 32 bits. But a pointer is 64 bits.
As already said, CPU's with a segmented memory model will have pointers consisting of a pair of numbers. Likewise, member pointers consist of a pair of numbers.
Multiplication and division of pointer data types are only forbidden by the compiler.
Ultimately, pointers data types only exist in the compiler. What the CPU works with is not pointers, but integers and memory addresses. So there is nowhere else where these operations on pointer types could be forbidden. You might as well ask for the CPU to forbid concatenation of C++ string objects. It can't do that because the C++ string type only exists in the C++ language, not in the generated machine code.
However, to answer what you mean, look up the Motorola 68000 CPUs. I believe they have separate registers for integers and memory addresses. Which means that they can easily forbid such nonsensical operations.
All pointer values can be casted to a single integer.
You're safe there. The C and C++ standards guarantee that this is always possible, no matter the memory space layout, CPU architecture and anything else. Specifically, they guarantee an implementation-defined mapping. In other words, you can always convert a pointer to an integer, and then convert that integer back to get the original pointer. But the C/C++ languages say nothing about what the intermediate integer value should be. That is up to the individual compiler, and the hardware it targets.
Incrementing a pointer is equivalent to adding sizeof(the pointed data type) to the memory address stored by the pointer.
Again, this is guaranteed. If you consider that conceptually, a pointer does not point to an address, it points to an object, then this makes perfect sense. Adding one to the pointer will then obviously make it point to the next object. If an object is 20 bytes long, then incrementing the pointer will move it 20 bytes, so that it moves to the next object.
If a pointer was merely a memory address in a linear address space, if it was basically an integer, then incrementing it would add 1 to the address -- that is, it would move to the next byte.
Finally, as I mentioned in a comment to your question, keep in mind that C++ is just a language. It doesn't care which architecture it is compiled to. Many of these limitations may seem obscure on modern CPU's. But what if you're targeting yesteryear's CPU's? What if you're targeting the next decade's CPU's? You don't even know how they'll work, so you can't assume much about them. What if you're targeting a virtual machine? Compilers already exist which generate bytecode for Flash, ready to run from a website. What if you want to compile your C++ to Python source code?
Staying within the rules specified in the standard guarantees that your code will work in all these cases.
I don't have specific real world examples in mind but the "authority" is the C standard. If something is not required by the standard, you can build a conforming implementation that intentionally fails to comply with any other assumptions. Some of these assumption are true most of the time just because it's convenient to implement a pointer as an integer representing a memory address that can be directly fetched by the processor but this is just a consequent of "convenience" and can't be held as a universal truth.
Not required by the standard (see this question). For instance, sizeof(int*) can be unequal to size(double*). void* is guaranteed to be able to store any pointer value.
Not required by the standard. By definition, size is a part of representation. If the size can be different, the representation can be different too.
Not necessarily. In fact, "the bit length of an architecture" is a vague statement. What is a 64-bit processor, really? Is it the address bus? Size of registers? Data bus? What?
It doesn't make sense to "multiply" or "divide" a pointer. It's forbidden by the compiler but you can of course multiply or divide the underlying representation (which doesn't really make sense to me) and that results in undefined behavior.
Maybe I don't understand your point but everything in a digital computer is just some kind of binary number.
Yes; kind of. It's guaranteed to point to a location that's a sizeof(pointer_type) farther. It's not necessarily equivalent to arithmetic addition of a number (i.e. farther is a logical concept here. The actual representation is architecture specific)
For 6.: a pointer is not necessarily a memory address. See for example "The Great Pointer Conspiracy" by Stack Overflow user jalf:
Yes, I used the word “address” in the com­ment above. It is impor­tant to real­ize what I mean by this. I do not mean “the mem­ory address at which the data is phys­i­cally stored”, but sim­ply an abstract “what­ever we need in order to locate the value. The address of i might be any­thing, but once we have it, we can always find and mod­ify i."
And:
A pointer is not a mem­ory address! I men­tioned this above, but let’s say it again. Point­ers are typ­i­cally imple­mented by the com­piler sim­ply as mem­ory addresses, yes, but they don’t have to be."
Some further information about pointers from the C99 standard:
6.2.5 §27 guarantees that void* and char* have identical representations, ie they can be used interchangably without conversion, ie the same address is denoted by the same bit pattern (which doesn't have to be true for other pointer types)
6.3.2.3 §1 states that any pointer to an incomplete or object type can be cast to (and from) void* and back again and still be valid; this doesn't include function pointers!
6.3.2.3 §6 states that void* can be cast to (and from) integers and 7.18.1.4 §1 provides apropriate types intptr_t and uintptr_t; the problem: these types are optional - the standard explicitly mentions that there need not be an integer type large enough to actually hold the value of the pointer!
sizeof(char*) != sizeof(void(*)(void) ? - Not on x86 in 36 bit addressing mode (supported on pretty much every Intel CPU since Pentium 1)
"The in-memory representation of a pointer is the same as an integer of the same bit length" - there's no in-memory representation on any modern architecture; tagged memory has never caught on and was already obsolete before C was standardized. Memory in fact doesn't even hold integers, just bits and arguably words (not bytes; most physical memory doesn't allow you to read just 8 bits.)
"Multiplication of pointers is impossible" - 68000 family; address registers (the ones holding pointers) didn't support that IIRC.
"All pointers can be cast to integers" - Not on PICs.
"Incrementing a T* is equivalent to adding sizeof(T) to the memory address" - true by definition. Also equivalent to &pointer[1].
I don't know about the others, but for DOS, the assumption in #3 is untrue. DOS is 16 bit and uses various tricks to map many more than 16 bits worth of memory.
The in-memory representation of a pointer is the same as an integer of the same bit length as the architecture.
I think this assumption is false because on the 80186, for example, a 32-bit pointer is held in two registers (an offset register an a segment register), and which half-word went in which register matters during access.
Multiplication and division of pointer data types are only forbidden by the compiler.
You can't multiply or divide types. ;P
I'm unsure why you would want to multiply or divide a pointer.
All pointer values can be casted to a single integer. In other words, what architectures still make use of segments and offsets?
The C99 standard allows pointers to be stored in intptr_t, which is an integer type. So, yes.
Incrementing a pointer is equivalent to adding sizeof(the pointed data type) to the memory address stored by the pointer. If p is an int32* then p+1 is equal to the memory address 4 bytes after p.
x + y where x is a T * and y is an integer is equivilent to (T *)((intptr_t)x + y * sizeof(T)) as far as I know. Alignment may be an issue, but padding may be provided in the sizeof. I'm not really sure.
In general, the answer to all of the questions is "yes", and it's because only those machines that implement popular languages directly saw the light of day and persisted into the current century. Although the language standards reserve the right to vary these "invariants", or assertions, it hasn't ever happened in real products, with the possible exception of items 3 and 4 which require some restatement to be universally true.
It's certainly possible to build segmented MMU designs, which correspond roughly with the capability-based architectures that were popular academically in past years, but no such system has typically seen common use with such features enabled. Such a system might have conflicted with the assertions as it would probably have had large pointers.
In addition to segmented/capability MMUs, which often have large pointers, more extreme designs have tried to encode data types in pointers. Few of these were ever built. (This question brings up all of the alternatives to the basic word-oriented, a pointer-is-a-word architectures.)
Specifically:
The in-memory representation of all pointers for a given architecture is the same regardless of the data type pointed to. True except for extremely wacky past designs that tried to implement protection not in strongly-typed languages but in hardware.
The in-memory representation of a pointer is the same as an integer of the same bit length as the architecture. Maybe, certainly some sort of integral type is the same, see LP64 vs LLP64.
Multiplication and division of pointer data types are only forbidden by the compiler. Right.
All pointer values can be casted to a single integer. In other words, what architectures still make use of segments and offsets? Nothing uses segments and offsets today, but a C int is often not big enough, you may need a long or long long to hold a pointer.
Incrementing a pointer is equivalent to adding sizeof(the pointed data type) to the memory address stored by the pointer. If p is an int32* then p+1 is equal to the memory address 4 bytes after p. Yes.
It is interesting to note that every Intel Architecture CPU, i.e., every single PeeCee, contains an elaborate segmentation unit of epic, legendary, complexity. However, it is effectively disabled. Whenever a PC OS boots up, it sets the segment bases to 0 and the segment lengths to ~0, nulling out the segments and giving a flat memory model.
There were lots of "word addressed" architectures in the 1950s, 1960s and 1970s. But I cannot recall any mainstream examples that had a C compiler. I recall the ICL / Three Rivers PERQ machines in the 1980s that was word addressed and had a writable control store (microcode). One of its instantiations had a C compiler and a flavor of Unix called PNX, but the C compiler required special microcode.
The basic problem is that char* types on word addressed machines are awkward, however you implement them. You often up with sizeof(int *) != sizeof(char *) ...
Interestingly, before C there was a language called BCPL in which the basic pointer type was a word address; that is, incrementing a pointer gave you the address of the next word, and ptr!1 gave you the word at ptr + 1. There was a different operator for addressing a byte: ptr%42 if I recall.
EDIT: Don't answer questions when your blood sugar is low. Your brain (certainly, mine) doesn't work as you expect. :-(
Minor nitpick:
p is an int32* then p+1
is wrong, it needs to be unsigned int32, otherwise it will wrap at 2GB.
Interesting oddity - I got this from the author of the C compiler for the Transputer chip - he told me that for that compiler, NULL was defined as -2GB. Why? Because the Transputer had a signed address range: -2GB to +2GB. Can you beleive that? Amazing isn't it?
I've since met various people that have told me that defining NULL like that is broken. I agree, but if you don't you end up NULL pointers being in the middle of your address range.
I think most of us can be glad we're not working on Transputers!
I would like to know architectures which violate the assumptions I've
listed below.
I see that Stephen C mentioned PERQ machines, and MSalters mentioned 68000s and PICs.
I'm disappointed that no one else actually answered the question by naming any of the weird and wonderful architectures that have standards-compliant C compilers that don't fit certain unwarranted assumptions.
sizeof(int *) == sizeof(char *) == sizeof(void *) == sizeof(func_ptr
*) ?
Not necessarily. Some examples:
Most compilers for Harvard-architecture 8-bit processors -- PIC and 8051 and M8C -- make sizeof(int *) == sizeof(char *),
but different from the sizeof(func_ptr *).
Some of the very small chips in those families have 256 bytes of RAM (or less) but several kilobytes of PROGMEM (Flash or ROM), so compilers often make sizeof(int *) == sizeof(char *) equal to 1 (a single 8-bit byte), but sizeof(func_ptr *) equal to 2 (two 8-bit bytes).
Compilers for many of the larger chips in those families with a few kilobytes of RAM and 128 or so kilobytes of PROGMEM make sizeof(int *) == sizeof(char *) equal to 2 (two 8-bit bytes), but sizeof(func_ptr *) equal to 3 (three 8-bit bytes).
A few Harvard-architecture chips can store exactly a full 2^16 ("64KByte") of PROGMEM (Flash or ROM), and another 2^16 ("64KByte") of RAM + memory-mapped I/O.
The compilers for such a chip make sizeof(func_ptr *) always be 2 (two bytes);
but often have a way to make the other kinds of pointers sizeof(int *) == sizeof(char *) == sizeof(void *) into a a "long ptr" 3-byte generic pointer that has the extra magic bit that indicates whether that pointer points into RAM or PROGMEM.
(That's the kind of pointer you need to pass to a "print_text_to_the_LCD()" function when you call that function from many different subroutines, sometimes with the address of a variable string in buffer that could be anywhere in RAM, and other times with one of many constant strings that could be anywhere in PROGMEM).
Such compilers often have special keywords ("short" or "near", "long" or "far") to let programmers specifically indicate three different kinds of char pointers in the same program -- constant strings that only need 2 bytes to indicate where in PROGMEM they are located, non-constant strings that only need 2 bytes to indicate where in RAM they are located, and the kind of 3-byte pointers that "print_text_to_the_LCD()" accepts.
Most computers built in the 1950s and 1960s use a 36-bit word length or an 18-bit word length, with an 18-bit (or less) address bus.
I hear that C compilers for such computers often use 9-bit bytes,
with sizeof(int *) == sizeof(func_ptr *) = 2 which gives 18 bits, since all integers and functions have to be word-aligned; but sizeof(char *) == sizeof(void *) == 4 to take advantage of special PDP-10 instructions that store such pointers in a full 36-bit word.
That full 36-bit word includes a 18-bit word address, and a few more bits in the other 18-bits that (among other things) indicate the bit position of the pointed-to character within that word.
The in-memory representation of all pointers for a given architecture
is the same regardless of the data type pointed to?
Not necessarily. Some examples:
On any one of the architectures I mentioned above, pointers come in different sizes. So how could they possibly have "the same" representation?
Some compilers on some systems use "descriptors" to implement character pointers and other kinds of pointers.
Such a descriptor is different for a pointer pointing to the first "char" in a "char big_array[4000]" than for a pointer pointing to the first "char" in a "char small_array[10]", which are arguably different data types, even when the small array happens to start at exactly the same location in memory previously occupied by the big array.
Descriptors allow such machines to catch and trap the buffer overflows that cause such problems on other machines.
The "Low-Fat Pointers" used in the SAFElite and similar "soft processors" have analogous "extra information" about the size of the buffer that the pointer points into. Low-Fat pointers have the same advantage of catching and trapping buffer overflows.
The in-memory representation of a pointer is the same as an integer of
the same bit length as the architecture?
Not necessarily. Some examples:
In "tagged architecture" machines, each word of memory has some bits that indicate whether that word is an integer, or a pointer, or something else.
With such machines, looking at the tag bits would tell you whether that word was an integer or a pointer.
I hear that Nova minicomputers have an "indirection bit" in each word which inspired "indirect threaded code". It sounds like storing an integer clears that bit, while storing a pointer sets that bit.
Multiplication and division of pointer data types are only forbidden
by the compiler. NOTE: Yes, I know this is nonsensical. What I mean is
- is there hardware support to forbid this incorrect usage?
Yes, some hardware doesn't directly support such operations.
As others have already mentioned, the "multiply" instruction in the 68000 and the 6809 only work with (some) "data registers"; they can't be directly applied to values in "address registers".
(It would be pretty easy for a compiler to work around such restrictions -- to MOV those values from an address register to the appropriate data register, and then use MUL).
All pointer values can be casted to a single data type?
Yes.
In order for memcpy() to work right, the C standard mandates that every pointer value of every kind can be cast to a void pointer ("void *").
The compiler is required to make this work, even for architectures that still use segments and offsets.
All pointer values can be casted to a single integer? In other words,
what architectures still make use of segments and offsets?
I'm not sure.
I suspect that all pointer values can be cast to the "size_t" and "ptrdiff_t" integral data types defined in "<stddef.h>".
Incrementing a pointer is equivalent to adding sizeof(the pointed data
type) to the memory address stored by the pointer. If p is an int32*
then p+1 is equal to the memory address 4 bytes after p.
It is unclear what you are asking here.
Q: If I have an array of some kind of structure or primitive data type (for example, a "#include <stdint.h> ... int32_t example_array[1000]; ..."), and I increment a pointer that points into that array (for example, "int32_t p = &example_array[99]; ... p++; ..."), does the pointer now point to the very next consecutive member of that array, which is sizeof(the pointed data type) bytes further along in memory?
A: Yes, the compiler must make the pointer, after incrementing it once, point at the next independent consecutive int32_t in the array, sizeof(the pointed data type) bytes further along in memory, in order to be standards compliant.
Q: So, if p is an int32* , then p+1 is equal to the memory address 4 bytes after p?
A: When sizeof( int32_t ) is actually equal to 4, yes. Otherwise, such as for certain word-addressable machines including some modern DSPs where sizeof( int32_t ) may equal 2 or even 1, then p+1 is equal to the memory address 2 or even 1 "C bytes" after p.
Q: So if I take the pointer, and cast it into an "int" ...
A: One type of "All the world's a VAX heresy".
Q: ... and then cast that "int" back into a pointer ...
A: Another type of "All the world's a VAX heresy".
Q: So if I take the pointer p which is a pointer to an int32_t, and cast it into some integral type that is plenty big enough to contain the pointer, and then add sizeof( int32_t ) to that integral type, and then later cast that integral type back into a pointer -- when I do all that, the resulting pointer is equal to p+1?
Not necessarily.
Lots of DSPs and a few other modern chips have word-oriented addressing, rather than the byte-oriented processing used by 8-bit chips.
Some of the C compilers for such chips cram 2 characters into each word, but it takes 2 such words to hold a int32_t -- so they report that sizeof( int32_t ) is 4.
(I've heard rumors that there's a C compiler for the 24-bit Motorola 56000 that does this).
The compiler is required to arrange things such that doing "p++" with a pointer to an int32_t increments the pointer to the next int32_t value.
There are several ways for the compiler to do that.
One standards-compliant way is to store each pointer to a int32_t as a "native word address".
Because it takes 2 words to hold a single int32_t value, the C compiler compiles "int32_t * p; ... p++" into some assembly language that increments that pointer value by 2.
On the other hand, if that one does "int32_t * p; ... int x = (int)p; x += sizeof( int32_t ); p = (int32_t *)x;", that C compiler for the 56000 will likely compile it to assembly language that increments the pointer value by 4.
I'm most used to pointers being used in a contiguous, virtual memory
space.
Several PIC and 8086 and other systems have non-contiguous RAM --
a few blocks of RAM at addresses that "made the hardware simpler".
With memory-mapped I/O or nothing at all attached to the gaps in address space between those blocks.
It's even more awkward than it sounds.
In some cases -- such as with the bit-banding hardware used to avoid problems caused by read-modify-write -- the exact same bit in RAM can be read or written using 2 or more different addresses.

Resources