Is this pointer code legal on 64-bit computers - c

I plan to use memory across two pointers. Let's call them pointer1 and pointer2. Each pointer will be connected to its own share of memory as defined by block1 and block2 respectively.
I think this way works for all systems (both 32 and 64 bit):
char block1[100000];
char *pointer1=block1;
char block2[100000];
char *pointer2=block2;
However I think a faster way would be to use this code:
char block[200000];
char *pointer1=block;
char *pointer2=block+100000;
My question is would the last line of the last code fragment be compatible with 64-bit architecture?

The address space of a 32-bit architecture is of 2**32 = 4294967296. For a 64-bit is 18446744073709551616. I think you will be ok. THe compiler should handle it on its own. For your use case, it is just plain simple pointer arithmetics that is still in the address space.

What you have done is set up a memory pool in its most basic form. Your example uses char arrays and pointers, so you are unlikely to get unwanted results; however if your second pointer was , for instance, long * (with proper casting) you would get differences in alignment which could cause significantly slower code unless you take special precautions to align them manually (using hex values instead of decimal for offsets makes this a bit more obvious)
So in a more complex scenario, it would matter because long may need to be aligned to 8 bytes or 4.
I apologize for going a bit beyond the scope of the question, but I didn't want someone mistakenly extrapolating what is fine for char to mixed types onto a char[]

Related

Interlocked operations and alignment with _aligned_malloc

I am concerned about alignment and Interlocked operations. Again. The documentation for these functions states that the varaible we want to update should be aligned on a 32bit boundary, and that we can achieve this via _aligned_malloc. Fine.
So I have this small test program:
struct S
{
char c;
long l;
}an_S;
printf("%p, %p", (void*)(&(an_S.c)), (void*)(&(an_S.l)));
On release mode, output from this always gives me an address of the long which is 4 bytes after the address of the char, so hence it starts on a 32bit boundary.
1) Is this purely by chance, or can I rely on this hence no need for _aligned_malloc?
2) If I have to use aligned_malloc, can someone clarify how to do so? I've read the documentation at https://msdn.microsoft.com/en-us/library/8z34s9c6.aspx but that doesn't seem to show how to assign a value to the memory that is 'allocated'...
3) (Assuming I do need aligned_malloc) If I want an array of structures that have a long variable like the above, that needs to be operated on via an Interlocked operation, do I need to add some sort of constructor to set this up or would there be an easier way of doing it?
4) I did a Google search for _aligned_malloc+interlockedCompareExchange and it bought back only 70 results. That tells me that either the bulk of the code out there that uses InterlockedCompareExchange (62,800 results) is wrong or _aligned_malloc isn't necessary. Can someone please clarify?
If your structures are aligned, which is the default, then each member will be aligned suitable for the member type.
As far as malloc goes, the documentation for MSVC explains that on 32 targets the memory is 8 byte aligned, on 64 bit targets it is 16 byte aligned. So you are fine to use malloc.

How are addresses resolved by a compiler in a medium memory model?

I'm new to programming small/medium memory models CPUs. I am working with an embedded processor that has 256KB of flash code space contained in addresses 0x00000 to 0x3FFFF, and with 20KB of RAM contained in addresses 0xF0000 to 0xFFFFF. There are compiler options to choose between small, medium, or large memory models. I have medium selected. My question is, how does the compiler differentiate between a code/flash address and a RAM address?
Take for example I have a 1 byte variable at RAM address 10, and I have a const variable at the real address 10. I did something like:
value = *((unsigned char *)10);
How would the compiler choose between the real address 10 or the (virtual?) address 10. I suppose if I wanted to specify the value at real address 10 I would use:
value = *((const unsigned char *)10);
?
Also, can you explain the following code which I believe is related to the answer:
uint32_t var32; // 32 bit unsigned integer.
unsigned char *ptr; // 2 byte pointer.
ptr = (unsigned char *)5;
var32 = (uint32_t)ptr;
printf("%lu", var32)
The code prints 983045 (0xf0005 hex). It seems unrealistic, how can a 16 bit variable return a value greater than what 16 bits can store?
Read your compiler's documentation to find out details about each memory model.
It may have various sorts of pointer, e.g. char near * being 2-byte, and char far * being 4-byte. Alternatively (or as well as), it might have instructions for changing code pages which you'd have to manually invoke.
how can a 16 bit variable return a value greater than what 16 bits can store?
It can't. Your code converts the pointer to a 32-bit int. , and 0xF0005 can fit in a 32-bit int. Based on your description, I'd guess that char * is only pointing to the data area, and you would use a different sort of pointer to point to the code area.
I tried to comment on Matt's answer but my comment was too long, and I think it might be an answer, so here's my comment:
I think this is an answer, I'm really looking for more details though. I've read the manual but it doesn't have much information on the topic. You are right, the compiler has near/far keywords you can use to manually specify the address (type?). I guess the C compiler knows if a variable is a near or far pointer, and if it's a near pointer it generates instructions that map the 2 byte near pointer to a real address; and these generated mapping instructions are opaque to the C programmer. That would be my only guess. This is why the pointer returns a value greater than its 16 bit value; the compiler is mapping the address to an absolute address before it stores the value in var32. This is possible because 1) the RAM addresses begin at 0xF0000 and end at 0xFFFFF, so you can always map a near address to its absolute address by or'ing the address with 0xF0000, and 2) there is no overlap between a code (far) pointer and a near pointer or'd with 0xF0000. Can anyone confirm?
My first take would be read the documentation, however as I had seen, it was already done.
So my assumption would be that you somehow got to work for example on a large existing codebase which was developed with a not too widely supported compiler on a not too well known architecture.
In such a case (after all my attempts with acquiring proper documentation failed) my take would be generating assembler outputs for test programs, and analysing those. I did this a while ago, so it is not from thin air (it was a 8051 PL/M compiler running on an MDS-70, which was emulated by a DOS based emulator from the late 80s, for which DOS was emulated by DOSBox - yes, and for the huge codebase we needed to maintain we couldn't get around this mess).
So build simple programs which would do something with some pointers, compile those without optimizations to assembly (or request an assembly dump, whatever the compiler can do for you), and understand the output. Try to cover all pointer types and memory models you know of in your compiler. It will clarify what is happening, and hopefully the existing documentations will also help once you understand their gaps this way. Finally, don't stop at understanding just enough for the immediate problem, try to document the gaps properly, so later you won't need to redo the experiments to figure out things you once almost done.

Assemblers and word alignment

Today I learned that if you declare a char variable (which is 1 byte), the assembler actually uses 4 bytes in memory so that the boundaries lie on multiples of the word size.
If a char variable uses 4 bytes anyway, what is the point of declaring it as a char? Why not declare it as an int? Don't they use the same amount of memory?
When you are writing in assembly language and declare space for a character, the assembler allocates space for one character and no more. (I write in regard to common assemblers.) If you want to align objects in assembly language, you must include assembler directives for that purpose.
When you write in C, and the compiler translates it to assembly and/or machine code, space for a character may be padded. Typically this is not done because of alignment benefits for character objects but because you have several things declared in your program. For example, consider what happens when you declare:
char a;
char b;
int i;
char c;
double d;
A naïve compiler might do this:
Allocate one byte for a at the beginning of the relevant memory, which happens to be aligned to a multiple of, say, 16 bytes.
Allocate the next byte for b.
Then it wants to place the int i which needs four bytes. On this machine, int objects must be aligned to multiples of four bytes, or a program that attempts to access them will crash. So the compiler skips two bytes and then sets aside four bytes for i.
Allocate the next byte for c.
Skip seven bytes and then set aside eight bytes for d. This makes d aligned to a multiple of eight bytes, which is beneficial on this hypothetical machine.
So, even with a naïve compiler, a character object does not require four whole bytes to itself. It can share with neighbor character objects, or other objects that do not require greater alignment. But there will be some wasted space.
A smarter compiler will do this:
Sort the objects it has to allocate space for according to their alignment requirements.
Place the most restrictive object first: Set aside eight bytes for d.
Place the next most restrictive object: Set aside four bytes for i. Note that i is aligned to a multiple of four bytes because it follows d, which is an eight-byte object aligned to a multiple of eight bytes.
Place the least restrictive objects: Set aside one byte each for a, b, and c.
This sort of reordering avoids wasting space, and any decent compiler will use it for memory that it is free to arrange (such as automatic objects on stack or static objects in global memory).
When you declare members inside a struct, the compiler is required to use the order in which you declare the members, so it cannot perform this reordering to save space. In that case, declaring a mixture of character objects and other objects can waste space.
Q: Does a program allocate four bytes for every "char" you declare?
A: No - absolutely not ;)
Q: Is it possible that, if you allocate a single byte, the program might "pad" with extra bytes?
A: Yes - absolutely yes.
The issue is "alignment". Some computer architectures must access a data value with respect to a particular offset: 16 bits, 32 bits, etc. Other architectures perform better if they always access a byte with respect to an offset. Hence "padding":
http://en.wikipedia.org/wiki/Byte_padding#Data_structure_padding
There may indeed not be any point in declaring a single char variable.
There may however be many good reasons to want a char-array, where an int-array really wouldn't do the trick!
(Try padding a data structure with ints...)
Others have for the most part answered this. Assuming a char is a single byte, does declaring a char mean that it always pads to an alignment? Nope, some compilers do by default some dont, and many you can change the default using some sort of command somewhere. Does this mean you shouldnt use a char? It depends, first off the padding doesnt always happen so the few wasted bytes dont always happen. You are programming in a high level language using a compiler so if you think that you have only 3 wasted bytes in your whole binary...think again. Depending on the architecture using chars can have some savings, for example loading immediates saves you three bytes or more on some architectures. Other architectures just to do simple operations with the register extra instructions are required to sign extend or clip the larger register to behave like a byte sized register. If you are on a 32 bit computer and you are using an 8 bit character because you are only counting from 1 to 100, you might want to use a full sized int, in the long run you are probably not saving anyone anything by using the char. Now if this is an 8086 based pc running dos, that is a different story. Or an 8 bit microcontroller, then you want to lean toward the 8 bit variables as much as possible.

C why would a pointer be larger than an integer

I am playing around with sizeof() in GCC on a linux machine right now and I found something very surprising.
printf ("\nsize of int = %lu\n", sizeof(int));
printf ("\nsize of int* = %lu\n", sizeof(int *));
yields
size of int = 4
size of int* = 8
I thought the size of a pointer to an integer would be much smaller than the actual integer itself!
I am researching embedded software right now and I was under the understanding that passing by reference was more efficient ( in terms of power ) than passing by value.
Could someone please clarify why it is more efficient to pass by reference than by value if the size of the pointer is larger than the actual value.
Thanks!
Integer can be any size the compiler writer likes, the only rules (in standard C) are: a) int isn't smaller than a short or bigger than a long, and b) int has at least 16 bit.
It's not uncommon on a 64bit platform to keep int as 32bits for compatibility.
Passing by reference is more efficient than passing by value when the value to be passed is larger than the size of the reference. It makes a lot of sense to pass by reference if what you are passing is a large struct/object. It also makes sense to pass a reference if you want to make persistent modifications to your value.
Passing by reference is more efficient because no data (other than the pointer) needs to be copied. This means that this is only more efficient when passing classes with many fields or structs or any other data that is larger than a pointer on the system used.
In the case you mentioned it could indeed be more efficient to not use a pointer because the actual value is smaller than a pointer to it (at least on the machine you were using).
Bare in mind that on a 32 bit-machine a pointer has 4 bytes (4*8 = 32bits) while on the 64-bit machine you were apparently using the pointer has 8 bytes (8*8 = 64bits).
On even older 16 bit machines pointers do only require 2 bytes, maybe there are some embedded systems still using this architecture, but I don't know about this...
In C, a pointer, any pointer, is just a memory address. You're on a 64-bit machine, and at the hardware level, memory addresses are referred to with 64-bit values. This is why 64-bit machines can use much more memory than 32-bit machines.
A pointer to an integer can point at a single integer, but the same pointer can also point at ten, twenty, one hundred or one million integers.
Obviously passing a single 8 byte pointer in lieu of a single 4 byte integer is not a win; but passing a single 8 byte pointer in lieu of one million 4 byte integers certainly is.
One thing has nothing to do with the other. One is an address, a pointer to something, doesnt matter if that is a char or short or int or structure. The other is a language specific thing called an int, which the compiler for that system and that version of compiler and perhaps command line options happens to define as some size.
It appears as if you are running on a 64 bit system so all your pointers/addresses are going to be 64 bit. What they point to is a separate discussion, longs are probably going to be 64 bits as well, sometimes ints, shorts probably still 16 bit but not a hard/fast rule and chars hopefully 8 bits, but also not a hard/fast rule.
Where this can get even worse is cross compiling, while using llvm-gcc, before clang was as solid as it is now. With a 64 bit host the bytecode was all being generated based on the 64 bit host, so 64 bit integers, 64 bit pointers, etc. then when you do the backend for the arm target it had to use compiler library calls for all of this 64 bit work. The variables hardly needed to be shorts much less ints, but were ints. the -m32 switch was broken you still got 64 bit integers due to the host not the ultimate target. gcc directly doesnt appear to have this problem and clang+llvm doesnt currently have this problem either.
The short answer is the language defines some data types char, short, int, long, etc and those data types have a compiler implementation defined size. and address is just another implementation defined data type. it is like asking why is a short not the same number of bytes as a long? Because they are different data types one is a short, one is a long. One is an address the other is a variable, two different things.
http://developers.sun.com/solaris/articles/ILP32toLP64Issues.html
When converting 32-bit programs to 64-bit programs, only long types
and pointer types change in size from 32 bits to 64 bits; integers of
type int stay at 32 bits in size.
In 64-bit executables, pointers are 64-bits. Long ints are also 64-bits, but ints are only 32-bits.
In 32-bit executables, pointers, int's and long ints are all 32-bits. 32-bit executables also support 64-bit "long long" ints.
A pointer must be able to reference all of memory. If the (virtual) memory is larger than 4 Gi bytes or so, then the pointer must be more than 32 bits.

Does the size of pointers vary in C? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
Can the Size of Pointers Vary Depending on what’s Pointed To?
Are there are any platforms where pointers to different types have different sizes?
Is it possible that the size of a pointer to a float in c differs from a pointer to int? Having tried it out, I get the same result for all kinds of pointers.
#include <stdio.h>
#include <stdlib.h>
int main()
{
printf("sizeof(int*): %i\n", sizeof(int*));
printf("sizeof(float*): %i\n", sizeof(float*));
printf("sizeof(void*): %i\n", sizeof(void*));
return 0;
}
Which outputs here (OSX 10.6 64bit)
sizeof(int*): 8
sizeof(float*): 8
sizeof(void*): 8
Can I assume that pointers of different types have the same size (on one arch of course)?
Pointers are not always the same size on the same arch.
You can read more on the concept of "near", "far" and "huge" pointers, just as an example of a case where pointer sizes differ...
http://en.wikipedia.org/wiki/Intel_Memory_Model#Pointer_sizes
In days of old, using e.g. Borland C compilers on the DOS platform, there were a total of (I think) 5 memory models which could even be mixed to some extent. Essentially, you had a choice of small or large pointers to data, and small or large pointers to code, and a "tiny" model where code and data had a common address space of (If I remember correctly) 64K.
It was possible to specify "huge" pointers within a program that was otherwise built in the "tiny" model. So in the worst case it was possible to have different sized pointers to the same data type in the same program!
I think the standard doesn't even forbid this, so theoretically an obscure C compiler could do this even today. But there are doubtless experts who will be able to confirm or correct this.
Pointers to data must always be compatible with void* so generally they would be nowadays realized as types of the same width.
This statement is not true for function pointers, they may have different width. For that reason in C99 casting function pointers to void* is undefined behavior.
As I understand it there is nothing in the C standard which guarantees that pointers to different types must be the same size, so in theory an int * and a float * on the same platform could be different sizes without breaking any rules.
There is a requirement that char * and void * have the same representation and alignment requirements, and there are various other similar requirements for different subsets of pointer types but there's nothing that encompasses everything.
In practise you're unlikely to run into any implementation that uses different sized pointers unless you head into some fairly obscure places.
Yes. It's uncommon, but this would certainly happen on systems that are not byte-addressable. E.g. a 16 bit system with 64 Kword = 128KB of memory. On such systems, you can still have 16 bits int pointers. But a char pointer to an 8 bit char would need an extra bit to indicate highbyte/lowbyte within the word, and thus you'd have 17/32 bits char pointers.
This might sound exotic, but many DSP's spend 99.x% of the time executing specialized numerical code. A sound DSP can be a bit simpler if it all it has to deal with is 16 bits data, leaving the occasional 8 bits math to be emulated by the compiler.
I was going to write a reply saying that C99 has various pointer conversion requirements that more or less ensure that pointers to data have to be all the same size. However, on reading them carefully, I realised that C99 is specifically designed to allow pointers to be of different sizes for different types.
For instance on an architecture where the integers are 4 bytes and must be 4 byte aligned an int pointer could be two bits smaller than a char or void pointer. Provided the cast actually does the shift in both directions, you're fine with C99. It helpfully says that the result of casting a char pointer to an incorrectly aligned int pointer is undefined.
See the C99 standard. Section 6.3.2.3
Yes, the size of a pointer is platform dependent. More specifically, the size of a pointer depends on the target processor architecture and the "bit-ness" you compile for.
As a rule of thumb, on a 64bit machine a pointer is usually 64bits, on a 32bit machine usually 32 bits. There are exceptions however.
Since a pointer is just a memory address its always the same size regardless of what the memory it points to contains. So a pointer to a float, a char or an int are all the same size.
Can I assume that pointers of different types have the same size (on one arch of course)?
For the platforms with flat memory model (== all popular/modern platforms) pointer size would be the same.
For the platforms with segmented memory model, for efficiency, often there are platform-specific pointer types of different sizes. (E.g. far pointers in the DOS, since 8086 CPU used segmented memory model.) But this is platform specific and non-standard.
You probably should keep in mind that in C++ size of normal pointer might differ from size of pointer to virtual method. Pointers to virtual methods has to preserve extra bit of information to not to work properly with polymorphism. This is probably only exception I'm aware of, which is still relevant (since I doubt that segmented memory model would ever make it back).
There are platforms where function pointers are a different size than other pointers.
I've never seen more variation than this. All other pointers must be at most sizeof(void*) since the standard requires that they can be cast to void* without loss of information.
Pointer is a memory address - and hence should be the same on a specific machine. 32 bit machine => 4Bytes, 64 bit => 8 Bytes.
Hence irrespective of the datatype of the thing that the pointer is pointing to, the size of a pointer on a specific machine would be the same (since the space required to store a memory address would be the same.)
Assumption: I'm talking about near pointers to data values, the kind you declared in your question.

Resources