Can I user the 2 most significant bytes of a pointer for meta data? [duplicate]

Can I user the 2 most significant bytes of a pointer for meta data? [duplicate] - c

I read that a 64-bit machine actually uses only 48 bits of address (specifically, I'm using Intel core i7).
I would expect that the extra 16 bits (bits 48-63) are irrelevant for the address, and would be ignored. But when I try to access such an address I got a signal EXC_BAD_ACCESS.
My code is:
int *p1 = &val;
int *p2 = (int *)((long)p1 | 1ll<<48);//set bit 48, which should be irrelevant
int v = *p2; //Here I receive a signal EXC_BAD_ACCESS.
Why this is so? Is there a way to use these 16 bits?
This could be used to build more cache-friendly linked list. Instead of using 8 bytes for next ptr, and 8 bytes for key (due to alignment restriction), the key could be embedded into the pointer.

The high order bits are reserved in case the address bus would be increased in the future, so you can't use it simply like that
The AMD64 architecture defines a 64-bit virtual address format, of which the low-order 48 bits are used in current implementations (...) The architecture definition allows this limit to be raised in future implementations to the full 64 bits, extending the virtual address space to 16 EB (264 bytes). This is compared to just 4 GB (232 bytes) for the x86.
http://en.wikipedia.org/wiki/X86-64#Architectural_features
More importantly, according to the same article [Emphasis mine]:
... in the first implementations of the architecture, only the least significant 48 bits of a virtual address would actually be used in address translation (page table lookup). Further, bits 48 through 63 of any virtual address must be copies of bit 47 (in a manner akin to sign extension), or the processor will raise an exception. Addresses complying with this rule are referred to as "canonical form."
As the CPU will check the high bits even if they're unused, they're not really "irrelevant". You need to make sure that the address is canonical before using the pointer. Some other 64-bit architectures like ARM64 have the option to ignore the high bits, therefore you can store data in pointers much more easily.
That said, in x86_64 you're still free to use the high 16 bits if needed (if the virtual address is not wider than 48 bits, see below), but you have to check and fix the pointer value by sign-extending it before dereferencing.
Note that casting the pointer value to long is not the correct way to do because long is not guaranteed to be wide enough to store pointers. You need to use uintptr_t or intptr_t.
int *p1 = &val; // original pointer
uint8_t data = ...;
const uintptr_t MASK = ~(1ULL << 48);
// === Store data into the pointer ===
// Note: To be on the safe side and future-proof (because future implementations
// can increase the number of significant bits in the pointer), we should
// store values from the most significant bits down to the lower ones
int *p2 = (int *)(((uintptr_t)p1 & MASK) | (data << 56));
// === Get the data stored in the pointer ===
data = (uintptr_t)p2 >> 56;
// === Deference the pointer ===
// Sign extend first to make the pointer canonical
// Note: Technically this is implementation defined. You may want a more
// standard-compliant way to sign-extend the value
intptr_t p3 = ((intptr_t)p2 << 16) >> 16;
val = *(int*)p3;
WebKit's JavaScriptCore and Mozilla's SpiderMonkey engine as well as LuaJIT use this in the nan-boxing technique. If the value is NaN, the low 48-bits will store the pointer to the object with the high 16 bits serve as tag bits, otherwise it's a double value.
Previously Linux also uses the 63rd bit of the GS base address to indicate whether the value was written by the kernel
In reality you can usually use the 48th bit, too. Because most modern 64-bit OSes split kernel and user space in half, so bit 47 is always zero and you have 17 top bits free for use
You can also use the lower bits to store data. It's called a tagged pointer. If int is 4-byte aligned then the 2 low bits are always 0 and you can use them like in 32-bit architectures. For 64-bit values you can use the 3 low bits because they're already 8-byte aligned. Again you also need to clear those bits before dereferencing.
int *p1 = &val; // the pointer we want to store the value into
int tag = 1;
const uintptr_t MASK = ~0x03ULL;
// === Store the tag ===
int *p2 = (int *)(((uintptr_t)p1 & MASK) | tag);
// === Get the tag ===
tag = (uintptr_t)p2 & 0x03;
// === Get the referenced data ===
// Clear the 2 tag bits before using the pointer
intptr_t p3 = (uintptr_t)p2 & MASK;
val = *(int*)p3;
One famous user of this is the V8 engine with SMI (small integer) optimization. The lowest bit in the address will serve as a tag for type:
if it's 1, the value is a pointer to the real data (objects, floats or bigger integers). The next higher bit (w) indicates that the pointer is weak or strong. Just clear the tag bits and dereference it
if it's 0, it's a small integer. In 32-bit V8 or 64-bit V8 with pointer compression it's a 31-bit int, do a signed right shift by 1 to restore the value; in 64-bit V8 without pointer compression it's a 32-bit int in the upper half
32-bit V8
|----- 32 bits -----|
Pointer: |_____address_____w1|
Smi: |___int31_value____0|
64-bit V8
|----- 32 bits -----|----- 32 bits -----|
Pointer: |________________address______________w1|
Smi: |____int32_value____|0000000000000000000|
https://v8.dev/blog/pointer-compression
So as commented below, Intel has published PML5 which provides a 57-bit virtual address space, if you're on such a system you can only use 7 high bits
You can still use some work around to get more free bits though. First you can try to use a 32-bit pointer in 64-bit OSes. In Linux if x32abi is allowed then pointers are only 32-bit long. In Windows just clear the /LARGEADDRESSAWARE flag and pointers now have only 32 significant bits and you can use the upper 32 bits for your purpose. See How to detect X32 on Windows?. Another way is to use some pointer compression tricks: How does the compressed pointer implementation in V8 differ from JVM's compressed Oops?
You can further get more bits by requesting the OS to allocate memory only in the low region. For example if you can ensure that your application never uses more than 64MB of memory then you need only a 26-bit address. And if all the allocations are 32-byte aligned then you have 5 more bits to use, which means you can store 64 - 21 = 43 bits of information in the pointer!
I guess ZGC is one example of this. It uses only 42 bits for addressing which allows for 242 bytes = 4 × 240 bytes = 4 TB
ZGC therefore just reserves 16TB of address space (but not actually uses all of this memory) starting at address 4TB.
A first look into ZGC
It uses the bits in the pointer like this:
6 4 4 4 4 4 0
3 7 6 5 2 1 0
+-------------------+-+----+-----------------------------------------------+
|00000000 00000000 0|0|1111|11 11111111 11111111 11111111 11111111 11111111|
+-------------------+-+----+-----------------------------------------------+
| | | |
| | | * 41-0 Object Offset (42-bits, 4TB address space)
| | |
| | * 45-42 Metadata Bits (4-bits) 0001 = Marked0
| | 0010 = Marked1
| | 0100 = Remapped
| | 1000 = Finalizable
| |
| * 46-46 Unused (1-bit, always zero)
|
* 63-47 Fixed (17-bits, always zero)
For more information on how to do that see
Allocating Memory Within A 2GB Range
How can I ensure that the virtual memory address allocated by VirtualAlloc is between 2-4GB
Allocate at low memory address
How to malloc in address range > 4 GiB
Custom heap/memory allocation ranges
Side note: Using linked list for cases with tiny key values compared to the pointers is a huge memory waste, and it's also slower due to bad cache locality. In fact you shouldn't use linked list in most real life problems
Bjarne Stroustrup says we must avoid linked lists
Why you should never, ever, EVER use linked-list in your code again
Number crunching: Why you should never, ever, EVER use linked-list in your code again
Bjarne Stroustrup: Why you should avoid Linked Lists
Are lists evil?—Bjarne Stroustrup

A standards-compliant way to canonicalize AMD/Intel x64 pointers (based on the current documentation of canonical pointers and 48-bit addressing) is
int *p2 = (int *)(((uintptr_t)p1 & ((1ull << 48) - 1)) |
~(((uintptr_t)p1 & (1ull << 47)) - 1));
This first clears the upper 16 bits of the pointer. Then, if bit 47 is 1, this sets bits 47 through 63, but if bit 47 is 0, this does a logical OR with the value 0 (no change).

I guess no-one mentioned possible use of bit fields ( https://en.cppreference.com/w/cpp/language/bit_field ) in this context, e.g.
template<typename T>
struct My64Ptr
{
signed long long ptr : 48; // as per phuclv's comment, we need the type to be signed to be sign extended
unsigned long long ch : 8; // ...and, what's more, as Peter Cordes pointed out, it's better to mark signedness of bit field explicitly (before C++14)
unsigned long long b1 : 1; // Additionally, as Peter found out, types can differ by sign and it doesn't mean the beginning of another bit field (MSVC is particularly strict about it: other type == new bit field)
unsigned long long b2 : 1;
unsigned long long b3 : 1;
unsigned long long still5bitsLeft : 5;
inline My64Ptr(T* ptr) : ptr((long long) ptr)
{
}
inline operator T*()
{
return (T*) ptr;
}
inline T* operator->()
{
return (T*)ptr;
}
};
My64Ptr<const char> ptr ("abcdefg");
ptr.ch = 'Z';
ptr.b1 = true;
ptr.still5bitsLeft = 23;
std::cout << ptr << ", char=" << char(ptr.ch) << ", byte1=" << ptr.b1 <<
", 5bitsLeft=" << ptr.still5bitsLeft << " ...BTW: sizeof(ptr)=" << sizeof(ptr);
// The output is: abcdefg, char=Z, byte1=1, 5bitsLeft=23 ...BTW: sizeof(ptr)=8
// With all signed long long fields, the output would be: abcdefg, char=Z, byte1=-1, 5bitsLeft=-9 ...BTW: sizeof(ptr)=8
I think it may be quite a convenient way to try to make use of these 16 bits, if we really want to save some memory. All the bitwise (& and |) operations and cast to full 64-bit pointer are done by compiler (though, of course, executed in run time).

According to the Intel Manuals (volume 1, section 3.3.7.1) linear addresses has to be in the canonical form. This means that indeed only 48 bits are used and the extra 16 bits are sign extended. Moreover, the implementation is required to check whether an address is in that form and if it is not generate an exception. That's why there is no way to use those additional 16 bits.
The reason why it is done in such way is quite simple. Currently 48-bit virtual address space is more than enough (and because of the CPU production cost there is no point in making it larger) but undoubtedly in the future the additional bits will be needed. If applications/kernels were to use them for their own purposes compatibility problems will arise and that's what CPU vendors want to avoid.

Physical memory is 48 bit addressed. That's enough to address a lot of RAM. However between your program running on the CPU core and the RAM is the memory management unit, part of the CPU. Your program is addressing virtual memory, and the MMU is responsible for translating between virtual addresses and physical addresses. The virtual addresses are 64 bit.
The value of a virtual address tells you nothing about the corresponding physical address. Indeed, because of how virtual memory systems work there's no guarantee that the corresponding physical address will be the same moment to moment. And if you get creative with mmap() you can make two or more virtual addresses point at the same physical address (wherever that happens to be). If you then write to any of those virtual addresses you're actually writing to just one physical address (wherever that happens to be). This sort of trick is quite useful in signal processing.
Thus when you tamper with the 48th bit of your pointer (which is pointing at a virtual address) the MMU can't find that new address in the table of memory allocated to your program by the OS (or by yourself using malloc()). It raises an interrupt in protest, the OS catches that and terminates your program with the signal you mention.
If you want to know more I suggest you Google "modern computer architecture" and do some reading about the hardware that underpins your program.

Related

convert uint8_t* to uint64_t

What's more recommended or advisable way to convert array of uint8_t at offset i to uint64_t and why?
uint8_t * bytes = ...
uint64_t const v = ((uint64_t *)(bytes + i))[0];
or
uint64_t const v = ((uint64_t)(bytes[i+7]) << 56)
| ((uint64_t)(bytes[i+6]) << 48)
| ((uint64_t)(bytes[i+5]) << 40)
| ((uint64_t)(bytes[i+4]) << 32)
| ((uint64_t)(bytes[i+3]) << 24)
| ((uint64_t)(bytes[i+2]) << 16)
| ((uint64_t)(bytes[i+1]) << 8)
| ((uint64_t)(bytes[i]));

There are two primary differences.
One, the behavior of ((uint64_t *)(bytes + i))[0] is not defined by the C standard (unless certain prerequisites about what bytes point to are met). Generally, an array of bytes should not be accessed using a uint64_t type.
When memory defined as one type is accessed with another type, it is called aliasing, and the C standard only defines certain combinations of aliasing. Some compilers may support some aliasing beyond what the standard requires, but using it is not portable. Additionally, if bytes + i is not suitably aligned for a uint64_t, the access may cause an exception or otherwise malfunction.
Two, loading the bytes through aliasing, if it is defined (by the standard or by compiler extension), interprets the bytes using the memory ordering for the C implementation. Some C implementations store the bytes representing integers in memory from low address to high address for low-position-value bytes to high-position-value bytes, and some store them from high address to low address. (And they can be stored in non-consecutive orders too, although this is rare.) So loading the bytes this way will produce different values from the same bytes in memory based on what order the C implementation uses.
But loading the bytes and using shifts to combine them will always produce the same value from the same bytes in memory regardless of what order the C implementation uses.
The first method should be avoided, because there is no need for it. If one desires to interpret the bytes using the C implementation’s ordering, this can be done with:
uint64_t t;
memcpy(&t, bytes+i, sizeof t);
const uint64_t v = t;
Using memcpy provides a portable way of aliasing the uint64_t to store bytes into it. Good compilers recognize this idiom and will optimize the memcpy to a load from memory, if suitable for the target architecture (and if optimization is enabled).
If one desires to interpret the bytes using little-endian ordering, as shown in the code in the question, then the second method may be used. (Sometimes platforms will have routines that may provide more efficient code for this.)

You can also use memcpy
uint64_t n;
memcpy(&n, bytes + i, sizeof(uint64_t));
const uint64_t v = n;

The first option has two big problems that qualify as undefined behavior (anything can happen):
A uint8_t* or array of uint8_t is not necessarily aligned the same way as required by a larger type like uint64_t. Simply casting to uint64_t* leads to misaligned access. This can cause hardware exceptions, program crashes, slower code etc, all depending on the alignment requirements of the specific target.
It violates the internal type system of C, where each object in memory known by the compiler has an "effective type" that the compiler keeps track of. Based on this, the compiler is allowed to make certain assumptions regarding if a certain memory region have been accessed or not during optimization. If your code violates these type rules, as it would in this case, wrong machine code could get generated.
This is most commonly referred to as the strict aliasing rule and your cast followed by dereferencing would be a so-called "strict aliasing violation".
The second option is sound code, because:
When doing shifts or other forms of bitwise arithmetic, a large integer type should be used. That is, unsigned int or larger - depending on system. Using signed types or small integer types can lead to undefined behavior or unexpected results. See Implicit type promotion rules regarding problems with small integer types implicitly changing signedness in some expressions.
If not for the cast to uint64_t, then the bytes[i+7] << 56 shift would involve an implicit promotion of the left operand from uint8_t to int, which would be a bug. Because if the most significant bit (MSB) of the byte is set and we shift into/beyond the sign bit, we invoke undefined behavior - again, anything can happen.
And naturally we need to use a 64 bit type in this specific case or otherwise we wouldn't be able to shift as far as 56 bits. Shifting beyond the range of the type of the left operand is also undefined behavior.
Note that whether to pick the order of bytes[i+7] << 56 versus the alternative bytes[i+0] << 56 depends on the underlying CPU endianess. Bit shifts are nice since the actual shift ignores if the destination type is using big or little endian. But in this case you must know in advance which byte in the source array you want to correspond to the most significant. This code you have here will work if the array was built based on little endian formatting, since the last byte of the array is shifted to the highest address.
As for the uint64_t const v = , the const qualifier is a bit strange to have at local scope like that. It's harmless but confusing and doesn't really add anything of value inside a local scope. I would just drop it.

How are 14-bit memory addresses accessed?

I know that each two letters in a hexdecimal address represents a byte, meaning that 0xFFFF is a 16bit address and can represent 65,536 bytes of memory, if the system is byte-addressable. However, if talking about a number of bits that is not a multiple of 8 (such as 14bit address), how can the operating system represnt these addresses?
0xFFFF -> 16 bit (e.g. Virtual memory address)
14 bit -> 0xFFF ? (Physical memory address)
One might say that the system has to be not byte addressable to access a not multiple of 8 address.. Then what will it be? I want the 16 bit of the virtual address to be byte addressable so we can easily access the data stored at that address, and I want to represent it in C code, but I have trouble representing the 14 bit physical memory addresses.
#define MAX_VIRT_ADDR ((0xffff) - 1) /* 65,536 */
#define MAX_PHYS_ADDR ((?) - 1) /* Max of 14bit physical memory space */

The addresses are still just numbers. On a system with a 14-bit address bus, they go from 00000000000000 (binary) up to 11111111111111 (binary).
We can also write those numbers in decimal: they go from 0 up to 16383.
Or we can write them in hexadecimal: they go from 0 up to 3FFF (or 0000 up to 3FFF).
Or in octal: they go from 0 up to 37777 (or 00000 up to 37777).
Or in any other system we like.
Typically a system based on 8-bit bytes will allow 2 bytes to be used to access a memory address. If the system has some kind of memory protection unit, and it's configured appropriately, then addresses above 3FFF may cause some kind of invalid address exception (i.e. a segfault). Otherwise, usually the extra bits are just ignored, so that no matter whether the program accesses "address" 0x0005, 0x4005, 0x8005, or 0xC005, the actual address sent to the address bus is 5 (binary: 00000000000101).

Your maximum "virtual" 16-bit address is 0xFFFF. Your maximum "physical" 14-bit address is 0x3FFF.
There's no rule that says address sizes have to be powers of 2, or that they have to be the same size as the words being addressed. It's massively more convenient to do things that way, but not required.
The old Motorola 68K had 32-bit words but a 24-bit address bus - address values were stored in 32-bit words with the upper 8 bits left unused.
As for mapping your 16-bit "virtual" address space onto a 14-bit "physical" address space, treat the upper two bits in the virtual address as a page number, treat the lower 14 bits as the offset into the page, map them directly to "physical" addresses. Store in a 16-bit type like uint16_t, then use macros to extract page number and address like so:
#define PAGENO(vaddr) ((0xC000 & vaddr) >> 14)
#define PHADDR(vaddr) (0x3FFF & vaddr)

It's simple: your 14bit address is
struct {
unsigned addr : 14;
};
You simply ignore 2 bits of the 16bit value.
Another way is to sign extend the 14bit value to 16bit. That's what 64bit systems like AMD64 or ARM64 do. They have 43-56 bits address space depending on the CPU and that is sign extended to 64bit. Addresses where the top bits aren't all 0 or all 1 are illegal.

Exotic systems where bytes have more than 8 bits still (as far as I know) use the same addressing on byte level. Only the size of a byte changes. Certain digital signal processors like this do exist in the real world, although those would not run on an OS but get coded as "bare metal". Traditionally, DSP programming is also most often done in assembler rather than C.
So for any system with 16 bit wide address bus, you'll have:
#define MAX_VIRT_ADDR 0xffffu /* 65,535 */
#define MAX_PHYS_ADDR 0xffffu /* 65,535 */
And the size of one byte is irrelevant. If you'd design the system in any other way, you'd probably lose the advantage of having a larger byte size.

Replacing Bitfields with Bitshifting in an Embedded Register Struct

I'm trying to get a little fancier with how I write my drivers for peripherals in embedded applications.
Naturally, reading and writing to predefined memory mapped areas is a common task, so I try to wrap as much stuff up in a struct as I can.
Sometimes, I want to write to the whole register, and sometimes I want to manipulate a subset of bits in this register. Lately, I've read some stuff that suggests making a union that contains a single uintX type that's big enough to hold the whole register (usually 8 or 16 bits), as well as a struct that has a collection of bitfields in it to represent the specific bits of that register.
After reading a few comments on some of these posts that have this strategy outlined for managing multiple control/status registers for a peripheral, I concluded that most people with experience in this level of embedded development dislike bitfields largely due to lack of portability and eideness issues between different compilers...Not to mention that debugging can be confounded by bitfields as well.
The alternative that most people seem to recommend is to use bit shifting to ensure that the driver will be portable between platforms, compilers and environments, but I have had a hard time seeing this in action.
My question is:
How do I take something like this:
typedef union data_port
{
uint16_t CCR1;
struct
{
data1 : 5;
data2 : 3;
data3 : 4;
data4 : 4;
}
}
And get rid of the bitfields and convert to a bit-shifting scheme in a sane way?
Part 3 of this guys post here describes what I'm talking about in general...Notice at the end, he puts all the registers (wrapped up as unions) in a struct and then suggests to do the following:
define a pointer to refer to the can base address and cast it as a pointer to the (CAN) register file like the following.
#define CAN0 (*(CAN_REG_FILE *)CAN_BASE_ADDRESS)
What the hell is this cute little move all about? CAN0 is a pointer to a pointer to a function of a...number that's #defined as CAN_BASE_ADDRESS? I don't know...He lost me on that one.

The C standard does not specify how much memory a sequence of bit-fields occupies or what order the bit-fields are in. In your example, some compilers might decide to use 32 bits for the bit-fields, even though you clearly expect it to cover 16 bits. So using bit-fields locks you down to a specific compiler and specific compilation flags.
Using types larger than unsigned char also has implementation-defined effects, but in practice it is a lot more portable. In the real world, there are just two choices for an uintNN_t: big-endian or little-endian, and usually for a given CPU everybody uses the same order because that's the order that the CPU uses natively. (Some architectures such as mips and arm support both endiannesses, but usually people stick to one endianness across a large range of CPU models.) If you're accessing a CPU's own registers, its endianness may be part of the CPU anyway. On the other hand, if you're accessing a peripheral, you need to take care.
The documentation of the device that you're accessing will tell you how big a memory unit to address at once (apparently 2 bytes in your example) and how the bits are arranged. For example, it might state that the register is a 16-bit register accessed with a 16-bit load/store instructions whatever the CPU's endianness is, that data1 encompasses the 5 low-order bits, data2 encompasses the next 3, data3 the next 4 and data4 the next 4. In this case, you would declare the register as a uint16_t.
typedef volatile uint16_t data_port_t;
data_port_t *port = GET_DATA_PORT_ADDRESS();
Memory addresses in devices almost always need to be declared volatile, because it matters that the compiler reads and writes to them at the right time.
To access the parts of the register, use bit-shift and bit-mask operators. For example:
#define DATA2_WIDTH 3
#define DATA2_OFFSET 5
#define DATA2_MAX (((uint16_t)1 << DATA2_WIDTH) - 1) // in binary: 0000000000000111
#define DATA2_MASK (DATA2_MAX << DATA2_OFFSET) // in binary: 0000000011100000
void set_data2(data_port_t *port, unsigned new_field_value)
{
assert(new_field_value <= DATA2_MAX);
uint16_t old_register_value = *port;
// First, mask out the data2 bits from the current register value.
uint16_t new_register_value = (old_register_value & ~DATA2_MASK);
// Then mask in the new value for data2.
new_register_value |= (new_field_value << DATA2_OFFSET);
*port = new_register_value;
}
Obviously you can make the code a lot shorter. I separated it out into individual tiny steps so that the logic should be easy to follow. I include a shorter version below. Any compiler worth its salt should compile to the same code except in non-optimizing mode. Note that above, I used an intermediate variable instead of doing two assignments to *port because doing two assignments to *port would change the behavior: it would cause the device to see the intermediate value (and another read, since |= is both a read and a write). Here's the shorter version, and a read function:
void set_data2(data_port_t *port, unsigned new_field_value)
{
assert(new_field_value <= DATA2_MAX);
*port = (*port & ~(((uint16_t)1 << DATA2_WIDTH) - 1) << DATA2_OFFSET))
| (new_field_value << DATA2_OFFSET);
}
unsigned get_data2(data_port *port)
{
return (*port >> DATA2_OFFSET) & DATA2_MASK;
}
#define CAN0 (*(CAN_REG_FILE *)CAN_BASE_ADDRESS)
There is no function here. A function declaration would have a return type followed by an argument list in parentheses. This takes the value CAN_BASE_ADDRESS, which is presumably a pointer of some type, then casts the pointer to a pointer to CAN_REG_FILE, and finally dereferences the pointer. In other words, it accesses the CAN register file at the address given by CAN_BASE_ADDRESS. For example, there may be declarations like
void *CAN_BASE_ADDRESS = (void*)0x12345678;
typedef struct {
const volatile uint32_t status;
volatile uint16_t foo;
volatile uint16_t bar;
} CAN_REG_FILE;
#define CAN0 (*(CAN_REG_FILE *)CAN_BASE_ADDRESS)
and then you can do things like
CAN0.foo = 42;
printf("CAN0 status: %d\n", (int)CAN0.status);

1.
The problem when getting rid of bitfields is that you can no more use simple assignment statements, but you must shift the value to write, create a mask, make an AND to wipe out the previous bits, and use an OR to write the new bits. Reading is similar reversed. For example, let's take an 8-bit register defined like this:
val2.val1
0000.0000
val1 is the lower 4 bits, and val2 is the upper 4. The whole register is named REG.
To read val1 into tmp, one should issue:
tmp = REG & 0x0F;
and to read val2:
tmp = (REG >> 4) & 0xF; // AND redundant in this particular case
or
tmp = (REG & 0xF0) >> 4;
But to write tmp to val2, for example, you need to do:
REG = (REG & 0x0F) | (tmp << 4);
Of course some macro can be used to facilitate this, but the problem, for me, is that reading and writing require two different macros.
I think that bitfield is the best way, and a serious compiler should have options to define endiannes and bit ordering of such bitfields. Anyway, this is the future, even if, for now, maybe not every compiler has full support.
2.
#define CAN0 (*(CAN_REG_FILE *)CAN_BASE_ADDRESS)
This macro defines CAN0 as a dereferenced pointer to the base address of the CAN register(s), no function declaration is involved. Suppose you have an 8-bit register at address 0x800. You could do:
#define REG_BASE 0x800 // address of the register
#define REG (*(uint8_t *) REG_BASE)
REG = 0; // becomes *REG_BASE = 0
tmp = REG; // tmp=*REG_BASE
Instead of uint_t you can use a struct type, and all the bits, and probably all the bytes or words, go magically to their correct place, with the right semantics. Using a good compiler of course - but who doesn't want to deploy a good compiler?
Some compilers have/had extensions to assign a given address to a variable; for example old turbo pascal had the ABSOLUTE keyword:
var CAN: byte absolute 0x800:0000; // seg:ofs...!
The semantic is the same as before, only more straightforward because no pointer is involved, but this is managed by the macro and the compiler automatically.

Address casting in C

Question related to little endian and big endian:
unsigned int i = 0x12345678; // assuming int is 4 bytes.
unsigned char* pc = &i;
Now, if *pc is 12 that means it is BIG ENDIAN because lowest address is storing MSB and if 78 Little. Is my understanding correct?
If yes, then my question is, why pc will get the lowest address of i? How does it works? Also, how many memory addresses will be needed to store i?
Assume a 32 bit Architecture

Any variable will be identified by a single address, regardless of the size of the type.
If int i is stored at address 0x1f00, it takes up the four bytes 0x1f00, 0x1f01, 0x1f02 and 0x1f03 of space. Still, when you create a reference to it, you will get the start address only, because the size is implied by the type.
So when you create a reference to an int and cast it to a char reference, you don't change the address, you simply tell the compiler to treat it as a char instead. I.e. the address is still 0x1f00, and when you dereference it you will read whatever is stored there, which is like you say MSB for little endian and LSB for big endian machines.

Yes; your understanding is correct.
Because that is the way C works, and because it makes sense. The address of i is the starting byte followed by 3 other bytes (in the 32-bit architecture). That is, given the starting address and the size, the bytes where the value is stored is &i+0, &i+1, &i+2 and &i+3 (rather than &i-0, &i-1, &i-2 and &i-3 which you seem to think might make some sense).
For most practical purposes, it really makes very little difference, so it really isn't something to sweat over. As long as the hardware is self-consistent, all works OK. There were (still are) some chips that could be switched from big-endian to little-endian at run-time.

int i = 0X01234567; // int occupies 4 bytes
Address in the increasing order
--------------------------------------->
+----------+-----------+-----------+----------+----------+
| 0013FF59 | 0013FF60 | 0013FF61 |0013FF62 | 0013FF63 |
+----------+-----------+-----------+----------+----------+
| | 67 | 45 | 23 | 01 |
+----------+-----------+-----------+----------+----------+
why pc will get the lowest address of i? How does it works?
when you say
unsigned char* pc = &i;
pc gets the lowest address. and when you say int j = *pc, the system knows, int occupies 4 bytes and it is little endian. It knows how to get the number from the address stored in pc.
Also, how many memory addresses will be needed to store i?
On your system 4 memory locations are need to store i. And the access i, 1 memory address is enough.

Alternative of FP_SEG and FP_OFF for converting pointer to linear address

On 16 bit dos machine there is options like FP_SEG and FP_OFF for converting a pointer to linear address but since these method no more exist on 32 bit compiler what are other function that can do same on 32 bit machine??

They're luckily not needed as 32-bit mode is unsegmented and hence the addresses are always linear (a simplification, but let's keep it simple).
EDIT: The first version was confusing let's try again.
In 16-bit segmented mode (I'm exclusively referring to legacy DOS programs here, it'll probably be similar for other 16-bit x86 OSes) addresses are given in a 32-bit format consisting of a16-bit segment and a 16-bit offset. These are combined to form a 20-bit linear address (This is where the infamous 640K barrier comes from, 2**20 = 1MB and 384K are reserved for the system and bios leaving ~640K for user programs) by multiply the segment by 16 = 0x10 (equivalent to shifting left by 4) and adding the offset. I.e.: linear = segment*0x10 + offset.
This means that 2**12 segment:address type pointers will refer to the same linear address, so in general there is no way to obtain the 32-bit value used to form the linear address.
In old DOS programs that used far - segmented - pointers (as opposed to near pointers, which only contained an offset and implicitly used ds segment register) they were usually treated as 32-bit unsigned integer values where the 16 most significant bits were the segment and the 16 least significant bits the offset. This gives the following macro definitions for FP_SEG and FP_OFF (using the types from stdint.h):
#define FP_SEG(x) (uint16_t)((uint32_t)(x) >> 16) /* grab 16 most significant bits */
#define FP_OFF(x) (uint16_t)((uint32_t)(x)) /* grab 16 least significant bits */
To convert a 20-bit linear address to a segmented address you have many options (2**12). One way could be:
#define LIN_SEG(x) (uint16_t)(((uint32_t)(x)&0xf0000)>>4)
#define LIN_OFF(x) (uint16_t)((uint32_t)(x))
Finally a quick example of how it all works together:
Segmented address: a = 0xA000:0x0123
As 32-bit far pointer b = 0xA0000123
20-bit linear address: c = 0xA0123
FP_SEG(b) == 0xA000
FP_OFF(b) == 0x0123
LIN_SEG(c) = 0xA000
LIN_OFF(c) = 0x0123

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight