When do I use xdata? - c

I am new at embedded system programming. I am working on a device that uses an 8051 chipset. I have noticed in the sample programs that when defining variables, sometimes they use the keyword xdata. like this...
static unsigned char xdata PatternSize;
while other times the xdata keyword is omitted.
My understanding is that the xdata keyword instructs the compiler that that variable is to be stored in external, flash, memory.
In what cases should I store variables externally with xdata? Accessing those variables takes longer, right? Values stored using xdata do not remain after a hard reset of the device do they?
Also, I understand that the static keyword means that the variable will persist through every call to the function it is defined in. Do static and xdata have to be used together?

The 8051 architecture has three separate address spaces, the core RAM uses an 8 bit address, so can be up to 256 bytes, XDATA is a 16bit address space (64Kbytes) with read/write capability, and the program space is a 16bit address space with execution and read-only data capability. Because of its small address range and close coupling to the core, addressing the core RAM is more efficient in terms of code space and access cycles
The original 8051 core had tiny-on-chip RAM (an address space of 256 bytes but some variants had half that in actual memory), and XDATA referred to off-chip data memory (as opposed to program memory). However most modern 8051 architecture devices have on-chip XDATA and program memory.
So you might use the core memory when performance is critical and XDATA for larger memory objects. However the compiler should in most cases make this decision for you (check your compilr's manual, it will describe in detail how memory is allocated). The instruction set makes it efficient to implement the stack in core memory, whereas static and dynamically allocated data would usually be more sensibly allocated in XDATA. If the compiler has an XDATA keyword, then it will override the compiler's strategy, and should only be used when the compiler's strategy somehow fails since it will reduce the portability of the code.
[edit] Note also that the core memory includes a 32byte bit-addressable region, the bit-addressing instructions use an 8bit address into this region to access individual bits directly. The region exists within the 256byte byte addressable core memory, so is both bit and byte addressable[/edit]

xdata tells the compiler that the data is stored in external RAM so it has to use a different instruction to read and write that memory instead of internal RAM.
Accessing external data does take longer. I usually put interrupt variables in internal RAM and most large arrays in external RAM.
As to the state of the external RAM after a hard reset (not power cycle): That would depend on the hardware setup. Does a reset line go to the external chip? Also some chips come with XDATA within the CPU chip. Read that again. Some chips have an 8051 CPU plus some amount of XDATA within the IC.
static and xdata do not overlap. Static tells the compiler how to allocate a variable (on a stack or at a memory location). Xdata tells the compiler how to get to that variable. Static can also restrict the name space of that variable to just that file. You can have an xdata static variable that is local to just a function, and have a static variable that is local to a function but uses internal RAM.

An important point not yet mentioned is that because different instructions are used to access different memory areas, the hardware has no unified concept of a "pointer". Any address which is known to be in DATA/IDATA space may be uniquely identified with a one-byte pointer; likewise any address which is known to be in PDATA space. Any address which is known to be in CODE space may be identified with a two-byte pointer; likewise any address which is known to be in XDATA space. In many cases, though, a routine like memcpy won't know in advance which memory space should be used with the passed-in pointers. To accommodate that, 8x51 compilers generally use a three-byte pointer type which may be used to access things in any memory space (one byte selects which type of instructions should be used with the pointer, and the other bytes hold the value). A pointer declaration like:
char *ptr;
will define a three-byte pointer which can point to any memory space. Changing the declaration to
char xdata *data ptr;
will define a two-byte pointer which is stored in DATA space, but which can only point to things in the XDATA space. Likewise
char data * data ptr;
will define a two-byte pointer which is stored in DATA space, but which can only point to things in the DATA and IDATA spaces. Code which uses pointers that point to a known data space will be much faster (possibly by a factor of ten) than code which uses the "general-purpose" three-byte pointers.

How and when to use xData memory area depends on the system architecture. Some systems may have RAM at this address while others could have ROM or Flash. In either case, access will be slower than accessing internal RAM, ROM or Flash.
Generally speaking, large items, constant items and lesser used items should go into xData. There are no standard rules as to what goes in xData, as it depends on the architecture.

The 8051 has a 128 byte range of scratch pad "pseudo-registers" that (most) compilers use as the default for declared variables. But obviously this area is very small, and you want to be able to put variables in the 16 bit memory address space too. That's what the xdata (i.e. "external data") specifier is for. What to put where depends, obviously, on what the data is and how you plan on using it.
Basically, I think this is the wrong question. You need to understand your CPU architecture first before learning how to use the C compiler's 8051-specific features.

Related

How are const char and char pointers represented in memory in STM32

How does MCU now that the string a variable is pointing on is in data memory or in program memory?
What does compiler do when I'm casting a const char * to char * (e.g. when calling strlen function)?
Can char * be used as a char * and const char * without any performance loss?
The STM32s use a flat 32-bit address space, so RAM and program memory (flash) are in the same logical space.
The Cortex core of course knows which type of memory is where, probably through hardware address decoders that are triggered by the address being accessed. This is of course way outside the scope of what C cares about, though.
Dropping const is not a run-time operation, so there should be no performance overhead. Of course dropping const is bad, since somewhere you risk someone actually believing that a const pointer means data there won't be written to, and going back on that promise can make badness happen.
By taking a STM32F4 example with 1MB flash/ROM memory and 192KB RAM memory (128KB SDRAM + 64KB CCM) - the memory map looks something as follows:
Flash/ROM - 0x08000000 to 0x080FFFFF (1MB)
RAM - 0x20000000 to 0x2001FFFF (128KB)
There's more areas with separate address spaces that I won't cover here for the simplicity of the explanation. Such memories include Backup SRAM and CCM RAM, just to name two. In addition, each area may be further divided sections, such as RAM being divided to bss, stack and heap.
Now onto your question about strings and their locations - constant strings, such as:
const char *str = "This is a string in ROM";
are placed in flash memory. During compilation, the compiler places a temporary symbol that references such string. Later during linking phase, the linker (which knows about concrete values for each memory section) lays down all of your data (program, constant data etc.) in each section one after another and - once it knows concrete values of each such object - replaces those symbols placed by the compiler with concrete values which then appear in your binary. Because of this, later on during runtime when the assignment above is done, your str variable is simply assigned a constant value deduced by the linker (such as 0x08001234) which points directly to the first byte of the string.
When it comes to dynamically allocated values - whenever you call malloc or new a similar task is done. Assuming sufficient memory is available, you are given the address to the requested chunk of memory in RAM and those calculations are during runtime.
As for the question regarding const qualifier - there is not meaning to it once the code is executed. For example, during runtime the strlen function will simply go over memory byte-by-byte starting at the passed location and ending once binary 0 is encountered. It doesn't matter what "type" of bytes are being analyzed, because this information is lost once your code is converted to byte code. Regarding const in your context - const qualifier appearing in function parameter denotes that such function will not modify the contents of the string. If it attempted to, a compilation error would be raised, unless it implicitly performs a cast to a non-const type. You may, of course, pass a non-const variable as a const parameter of a function. The other way however - that is passing a const parameter to a non-const function - will raise an error, as this function may potentially modify the contents of the memory you point to, which you implicitly specified to be non-modifiable by making it const.
So to summarize and answer your question: you can do casts as much as you want and this will not be reflected at runtime. It's simply an instruction to the compiler to treat given variable differently than the original during its type checks. By doing an implicit cast, you should however be aware that such cast may potentially be unsafe.
With and without const, assuming your string is truly read only, is going to change whether it lands in .data or .rodata or some other read only section (.text, etc). Basically is it going to be in flash or in ram.
The flash on these parts if I remember right at best has an extra wait state or is basically half the speed of ram...at best. (Fairly common for mcus in general, there are exceptions though). If you are running in the slower range of clocks, of you boost the clock the the ram performance vs flash will improve. So having it in flash (const) vs sram is going to be slower for any code that parses through that string.
This assumes your linker script and bootstrap are such that .data is actually copied to ram on boot...

Stack and heap confusion for embedded 8051

I am trying to understand a few basics concepts regarding the memory layout for a 8051 MCU architecture. I would be grateful if anyone could give me some clarifications.
So, for a 8051 MCU we have several types of memories:
IRAM - (idata) - used for general purpose registers and SFRs
PMEG - (code) - used to store code - (FLASH)
XDATA
on chip (data) - cache memory for data (RAM) /
off-chip (xdata) - external memory (RAM)
Questions:
So where is the stack actually located?
I would assume in IRAM (idata) but it's quite small (30-7Fh)- 79 bytes
What does the stack do?
Now, on one hand I read that it stores the return addresses when we call a function (e.g. when I call a function the return address is stored on the stack and the stack pointer is incremented).
http://www.alciro.org/alciro/microcontroladores-8051_24/subrutina-subprograma_357_en.htm
On the other hand I read that the stack stores our local variables from a function, variables which are "deleted" once we return from that function.
http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html
If I use dynamic memory allocation (heap), will that memory always be reserved in off-chip RAM (xdata), or it depends on compiler/optimization?
The 8051 has its origin in the 1970ies/early 80ies. As such, it has very limited ressources. The original version did (for instance) not even have XRAM, that was "patched" aside later and requires special (ans slow) accesses.
The IRAM is the "main memory". It really includes the stack (yes, there are only few bytes). The rest is used for global variables ("data" and "bss" section: initialized and uninitialized globals and statics). The XRAM might be used by a compiler for the same reason.
Note that with these small MCUs you do not use many local variables (and if, only 8bit types). A clever compiler/linker (I actually used some of these) can allocate local variables statically overlapping - unless there is recursion used (very unlikely).
Most notably, programs for such systems mostly do not use a heap (i.e. dynamic memory allocation), but only statically allocated memory. At most, they might use a memory pool, which provides blocks of fixed size and does not merged blocks.
Note that the IRAM includes some special registers which can be bit-addressed by the hardware. Normally, you would use a specialized compiler which can exploit these functions. Very likely some features require special assembler functions (these might be provided in a header as C-functions just generating the corresponding machine code instruction), called intrinsics.
The different memory areas might also require compiler-extensions to be used.
You might have a look at sdcc for a suitable compiler.
Note also that the 8051 has an extended harvard architecture (code and data seperated with XRAM as 3rd party).
Regarding your 2nd link: This is a very generalized article; it does not cover MCUs like the 8051 (or AVR, PIC and the like), but more generalized CPUs like x86, ARM, PowerPC, MIPS, MSP430 (which is also a smaller MCU), etc. using an external von Neumann architecture (internally most (if not all) 32+ bitters use a harvard architecture).
I don't have direct experience with your chips, but I have worked with very constrained systems in the past. So here is what I can answer:
Question 1 and 2: The stack is more than likely set within a very early startup routine. This will set a register to tell it where the stack should start. Typically, you want this in memory that is very fast to access because compiled code loves pushing and popping memory from the stack all the time. This includes return addresses in calls, local variable declarations, and the occasional call to directly allocate stack memory (alloca).
For your 3rd question, the heap is set wherever your startup routine set it to.
There is no particular area that a heap needs to live. If you want it to live in external memory, then it can be set there. You want it in your really small/fast area, you can do that too, though that is probably a very bad idea. Again, your chip's/compiler's manual or included code should show you an overloaded call to malloc(). From here, you should be able to walk backwards to see what addresses are being passed into its memory routines.
Your IRAM is so dang small that it feels more like Instruction RAM - RAM where you would put a subroutine or two to make running code from them more efficient. 80 bytes of stack space will evaporate very quickly in a typical C function call framework. Actually, for sizes like this, you might have to hand assemble stuff to get the most out of things, but that may be beyond your scope.
If you have other questions, let me know. This is the kind of stuff I like doing :)
Update
This page has a bunch of good information on stack management for your particular chip. It appears that the stack for this chip is indeed in IRAM and is very very constrained. It also appears that assembly level coding on this chip would be the norm as this amount of RAM is quite small indeed.
Heck, this is the first system I've seen in many years that has bank switching as a way to access more RAM. I haven't done that since the Color Gameboy's Z80 chip.
Concerning the heap:
There is also a malloc/free couple
You have to call init_mempool(), which is indicated in compiler documentation but it is somewhat uncommon.
The pseudo-code below to illustrate this.
However I used it only this way and did not try heavy used of malloc/free like you may find in dynamic linked list management, so I have no idea of the performance you get out of this.
//A "large" place in xdata to be used as heap
static char xdata heap_mem_pool [1000];
//A pointer located in data and pointing to something in xdata
//The size of the pointer is then 2 bytes instead of 3 ( the 3rd byte
//store the area specification data, idata, xdata )
//specifier not mandatory but welcome
char xdata * data shared_memory;
//...
u16 mem_size_needed;
init_mempool (heap_mem_pool, sizeof(heap_mem_pool));
//..
mem_size_needed = calcute_needed_memory();
shared_memory = malloc(mem_size_needed);
if ( 0 == shared_memory ) return -1;
//...use shared_memory pointer
//free if not needed anymore
free(shared_memory);
Some additionnal consequences about the fact that in general no function is reentrant ( or with some effort ) due to this stackless microcontroller.
I will call "my system" the systemI am working on at the present time: C8051F040 (Silab) with Keil C51 compiler ( I have no specific interest in these 2 companies )
The (function return address) stack is located low in the iram (idata on my system).
If it start at 30(dec) it means you have either global or local variables in your code that you requested to be in data RAM ( either because you choose a "small" memory model or because you use the keyword data in the variable declaration ).
Whenever you call a function the return 2 bytes address of the caller function will be save in this stack ( 16 bits code space ) and that's all: no registers saving, no arguments pushed onto the (non-existing)(data) stack. Your compiler may also limit the functions call depth.
Necessary arguments and local variables ( and certainly saved registers ) are placed somewhere in the RAM ( data RAM or XRAM )
So now imagine that you want to use the same innocent function ( like memcpy() ) both in your interrupt and in you normal infinite loop, it will cause sporadic bugs. Why ?
Due to the lack of stack, the compiler must share RAM memory places for arguments, local variables ... between several functions THAT DO NOT BELONG to the same call tree branch
The pitfall is that an interrupt is its own call tree.
So if an interrupt occurs while you were executing e.g the memcpy() in your "normal task", you may corrupt the execution of memcpy() because when going out of the interrupt execution, the pointers dedicated to the copy performed in the normal task will have the (end) value of the copy performed in the interrupt.
On my system I get a L15 linker error when the compiler detects that a function is called by more than one independant "branch"
You may make a function reentrant with the addition of the reentrant keyword and requiring the creation of an emulated stack on the top of the XRAM for example. I did not test on my system because I am already lacking of XRAM memory which is only 4kB.
See link
C51: USING NON-REENTRANT FUNCTION IN MAIN AND INTERRUPTS
In the standard 8051 uC, the stack occupies the same address space as register bank 1(08H to 0FH) by default at start up. That means, the stack pointer(SP register) will be having a value of 07H at startup(incremented to 08H when stack is PUSHed). This probably limits the stack memory to 8 bytes, if register bank 2(starting from 10H) is occuppied. If register banks 2 and 3 are not used, even that can be taken up by the stack(08H to 1FH).
If in a given program we need more than 24 bytes (08 to 1FH = 24 bytes) of stack, we can change the SP to point to RAM locations 30 – 7FH. This is done with the instruction “MOV SP, #xx”. This should clarify doubts surrounding the 8051 stack usage.

Hitech C data buffers in program memory

The C18 compiler allows variables in program memory with ROM qualifier, but the Hi-Tech C seems rather reluctant to utilize the Havard architecture to its best. So is there a way to create data buffers in program memory with the Hi-Tech C compiler (I am ready to compromise access speed).
I've seen indications of possibility with the psect but don't have any working implementation.
The HI-TECH PICC18 compiler places objects declared as const into program space by default. No special qualifiers like C18's RAM/ROM are needed:
3.5.3 Objects in Program Space
const objects are usually placed in program space. On the PIC18 devices, the program space is
byte-wide, the compiler stores one character per byte location and values are read using the table
read instructions. All const-qualified data objects and string literals are placed in the const psect.
The const psect is placed at an address above the upper limit of RAM since RAM and const
pointers use this address to determine if an access to ROM or RAM is required.
Note that placing frequently updated data in the microcontroller's flash memory may not be such a good idea, as flash has a limited number of program/erase cycles.
far pointers can be used to dereference program memory:
3.4.12.2 Const and Far Pointers
const and far pointers can either be 16 or 24 bits wide. Their size can be toggled with the --CP=24
or --CP=16 command line option. The code used to dereference them also changes with their size.
The same pointer size must be used for all modules in a project.
A pointer to far is identical to a pointer to const, except that pointers to far may be used to
write to the address they hold. A pointer to const objects cannot be used to write as the const
qualifier imposes that the object is read-only.
const and far pointers which are 16 bits wide can access all RAM areas and most of the program
space. At runtime when dereferenced, the contents of the pointer are examined. For addresses above
the upper limit of RAM the program space is accessed using table read or table write instructions.
Addresses below the upper limit of RAM access the data space. Even if the address held by a pointer
to const is in RAM, the RAM location may not be changed.
The default linker options always place const data at addresses above the upper limit of the data
space so that the correct memory space is accessed when dereferencing with pointers.
If the target device selected has more than 64k bytes of program space memory, then only the
lower 64k bytes may be accessed with 16-bit wide pointers. Provided that all program space objects
that need to be dereferenced are in the lower 64k bytes, 16-bit pointers to const and far objects
may still be used. The smaller pointer size results in less RAM required and less code produced and
so should be used whenever possible.
const and far pointers which are 24 bits wide can access all RAM areas and all of the program
space. At runtime when dereferenced, the contents of the pointer are examined. If bit number 21
in the address is set, the address is assumed to be a RAM address. Bit number 21 of the address is
then ignored. If Bit number 21 is clear, then the address is assumed to be of an object in the program
space and the access is performed using table read or table write instructions. Again, no writes to
objects are permitted using a pointer to const.
Note that when dereferencing a 24-bit pointer, the most significant implemented bit (bit number
21) of the TBLPTRU register may be overwritten. This bit may be used to enable access to the
configuration area of the PIC18 device. If loading the table pointer registers from hand-written
assembler code, make no assumptions about the state of bit number 21 prior to executing table read
or write instructions.
The quotes are from HI-TECH PICC18 v9.51 manual.

CPU and memory communication

I'm programmer-beginner, but I want to understand the things a bit more deeply. I did some research and read quite a lot of text, but I'm still yet to understand some things..
When coding a basic thing (in C):
int myNumber;
myNumber = 3;
printf("Here's my number: %d", myNumber);
I found out that (mainly on 32-bit CPU) integer takes place of 32 bits = 4 bytes. So at first line of my code CPU goes into the memory. The memory is byte-addressable, so CPU chooses 4 continuous bytes for my variable and stores the address to first (or last) byte.
On the second line of my code CPU uses his stored address of the MyNumber variable, goes to that address in the memory and finds there 32 bits of reserved space. His task now is to store there the number "3", so he fills those four bytes with the sequence 00000000-00000000-00000000-00000011.
On the third line it does the same - CPU goes to that address in memory and loads the number stored in that address.
(First question - Do I understand it right?)
What I don't understand is this:
The size of that address (pointer to that variable) is 4bytes in 32-bit CPU. (Thats why 32-bit CPU can use max 4GB of memory - because there are only 2^32 different addresses of binary length 32)
Now, where the CPU stores these addresses? Does he have some sort of its own memory or cache to store that? And why it stores the 32 bit long address to 32 bit long integer? Wouldn't it be better to simply store in its cache that actual number than the pointer to that when the sizes are the same?
And last one - if it stores somewhere in its own cache the addresses to all those integers and the lenghts are the same (4 bytes), it will need exactly the same space for storing the addresses as for the actual variables. But variables can take up to 4GBs of space so CPU must have 4GB of its own space to store the addresses to those variables. And that sounds strange..
Thank you for help!
I'm trying to understand that but it's so tough.. :-[
(First question - Do I understand it right?)
The first thing to recognise is that the value might not be stored in main memory at all. The compiler might decide to store it in a register instead, as this is more optimal.1
The memory is byte-addressable, so CPU chooses 4 continuous bytes for my variable and stores the address to first (or last) byte.
Assuming that the compiler does decide to store it in main memory, then yes, on a 32-bit machine, an int is typically 4 bytes, so 4 bytes will be allocated for storage.
The size of that address (pointer to that variable) is 4bytes in 32-bit CPU. (Thats why 32-bit CPU can use max 4GB of memory - because there are only 2^32 different addresses of binary length 32)
Note that the width of an int and the width of a pointer don't have to be the same, so there's not necessarily a connection with the size of the address space.
Now, where the CPU stores these addresses?
In the case of local variables, the address is effectively hardcoded into the executable itself, typically as an offset from the stack pointer.
In the case of dynamically-allocated objects (i.e. stuff that's been malloc-ed), the programmer typically maintains a corresponding pointer variable (otherwise there would be a memory leak!). That pointer might also be dynamically-allocated (in the case of a complex data structure), but if you go back far enough, you'll eventually reach something that's a local variable. In which case, the above rule applies.
But variables can take up to 4GBs of space so CPU must have 4GB of its own space to store the addresses to those variables.
If your program consists of independently malloc-ing millions of ints, then yes, you'd end up with just as much storage required for the pointers. But most programs don't look like that. You typically allocate much bigger objects (like an array, or a big struct).
cache
The specifics of where stuff is stored is architecture-specific. On a modern x86, there's typically 2 or 3 layers of cache sitting between the CPU and main memory. But the cache is not independently addressable; the CPU cannot decide to store the int in cache instead of main memory. Rather, the cache is effectively a redundant copy of a subset of main memory.
Another thing to consider is that the compiler will typically deal with virtual addresses when allocating storage for objects. On a modern x86, these are mapped to physical addresses (i.e. addresses that correspond to physical storage bytes in main memory) by dedicated hardware, along with OS support.
1. Alternatively, the compiler may be able to optimise it away entirely.
On the second line of my code CPU uses his stored address of the MyNumber variable, goes to that address in the memory and finds there 32 bits of reserved space.
Nearly correct. Memory is basically unstructured. The CPU can't see that there are 32 bits of "reserved space". But the CPU was instructed to read 32 bits of data, so it reads 32 bits of data starting from the specified address. Then it just has to hope/assume that those 32 bits actually contain something meaningful.
Now, where the CPU stores these addresses? Does he have some sort of its own memory or cache to store that? And why it stores the 32 bit long address to 32 bit long integer? Wouldn't it be better to simply store in its cache that actual number than the pointer to that when the sizes are the same?
The CPU has a small number of registers, which it can use to store data (common CPUs have 8, 16 or 32 registers, so they can only hold the particular variables that you're working with here and now). So to answer the last part first, yes, the compiler certainly might (and probably will) generate code to just store your int into a register, instead of storing it in a memory, and telling the CPU to load it from a specified address.
As for the other part of the question: ultimately, every part of the program is stored in memory. Part of it is a stream of instructions, and part of it is in chunks of data scattered around memory.
There are a few tricks that help with locating the data the CPU needs: part of the program's memory contains the stack, which typically stores local variables while they're in scope. The CPU always maintains a pointer to the top of the stack in one of its registers, so it can easily locate data on the stack, simply by modifying the stack pointer with a fixed offset. Instructions can directly contain such offsets, so in order to read your int, the compiler could for example generate code which writes the int to the top of the stack when you enter the function, and then when you need to refer to that function, have code which reads the data found at the address the stack pointer points to, plus the small offset needed to locate your variable.
And also keep in mind that the adresses your program sees may not be (or rather rarely are) physical adresses starting from the 'beginning of the memory' or 0. Mostly they are offsets into a specifc memory block where the memory manager knows the real address and the accesses via base+offest as the real data storage.
And we do need the memory since caches are limited ;-)
Mario
Internal to the CPU there is one register that contains the address of the next instruction to be executed. The instructions themselves keep information where the variable is. If the variable is optimized, the instruction may point to a register, but in general, the instruction will have an address of the variable being accessed. Your code, after compiled and loaded in memory, have all that embedded! I recommend looking into assembly language to get a better understanding of all that. Good luck!

What is the difference between far pointers and near pointers?

Can anybody tell me the difference between far pointers and near pointers in C?
On a 16-bit x86 segmented memory architecture, four registers are used to refer to the respective segments:
DS → data segment
CS → code segment
SS → stack segment
ES → extra segment
A logical address on this architecture is written segment:offset. Now to answer the question:
Near pointers refer (as an offset) to the current segment.
Far pointers use segment info and an offset to point across segments. So, to use them, DS or CS must be changed to the specified value, the memory will be dereferenced and then the original value of DS/CS restored. Note that pointer arithmetic on them doesn't modify the segment portion of the pointer, so overflowing the offset will just wrap it around.
And then there are huge pointers, which are normalized to have the highest possible segment for a given address (contrary to far pointers).
On 32-bit and 64-bit architectures, memory models are using segments differently, or not at all.
Since nobody mentioned DOS, lets forget about old DOS PC computers and look at this from a generic point-of-view. Then, very simplified, it goes like this:
Any CPU has a data bus, which is the maximum amount of data the CPU can process in one single instruction, i.e equal to the size of its registers. The data bus width is expressed in bits: 8 bits, or 16 bits, or 64 bits etc. This is where the term "64 bit CPU" comes from - it refers to the data bus.
Any CPU has an address bus, also with a certain bus width expressed in bits. Any memory cell in your computer that the CPU can access directly has an unique address. The address bus is large enough to cover all the addressable memory you have.
For example, if a computer has 65536 bytes of addressable memory, you can cover these with a 16 bit address bus, 2^16 = 65536.
Most often, but not always, the data bus width is as wide as the address bus width. It is nice if they are of the same size, as it keeps both the CPU instruction set and the programs written for it clearer. If the CPU needs to calculate an address, it is convenient if that address is small enough to fit inside the CPU registers (often called index registers when it comes to addresses).
The non-standard keywords far and near are used to describe pointers on systems where you need to address memory beyond the normal CPU address bus width.
For example, it might be convenient for a CPU with 16 bit data bus to also have a 16 bit address bus. But the same computer may also need more than 2^16 = 65536 bytes = 64kB of addressable memory.
The CPU will then typically have special instructions (that are slightly slower) which allows it to address memory beyond those 64kb. For example, the CPU can divide its large memory into n pages (also sometimes called banks, segments and other such terms, that could mean a different thing from one CPU to another), where every page is 64kB. It will then have a "page" register which has to be set first, before addressing that extended memory. Similarly, it will have special instructions when calling/returning from sub routines in extended memory.
In order for a C compiler to generate the correct CPU instructions when dealing with such extended memory, the non-standard near and far keywords were invented. Non-standard as in they aren't specified by the C standard, but they are de facto industry standard and almost every compiler supports them in some manner.
far refers to memory located in extended memory, beyond the width of the address bus. Since it refers to addresses, most often you use it when declaring pointers. For example: int * far x; means "give me a pointer that points to extended memory". And the compiler will then know that it should generate the special instructions needed to access such memory. Similarly, function pointers that use far will generate special instructions to jump to/return from extended memory. If you didn't use far then you would get a pointer to the normal, addressable memory, and you'd end up pointing at something entirely different.
near is mainly included for consistency with far; it refers to anything in the addressable memory as is equivalent to a regular pointer. So it is mainly a useless keyword, save for some rare cases where you want to ensure that code is placed inside the standard addressable memory. You could then explicitly label something as near. The most typical case is low-level hardware programming where you write interrupt service routines. They are called by hardware from an interrupt vector with a fixed width, which is the same as the address bus width. Meaning that the interrupt service routine must be in the standard addressable memory.
The most famous use of far and near is perhaps the mentioned old MS DOS PC, which is nowadays regarded as quite ancient and therefore of mild interest.
But these keywords exist on more modern CPUs too! Most notably in embedded systems where they exist for pretty much every 8 and 16 bit microcontroller family on the market, as those microcontrollers typically have an address bus width of 16 bits, but sometimes more than 64kB memory.
Whenever you have a CPU where you need to address memory beyond the address bus width, you will have the need of far and near. Generally, such solutions are frowned upon though, since it is quite a pain to program on them and always take the extended memory in account.
One of the main reasons why there was a push to develop the 64 bit PC, was actually that the 32 bit PCs had come to the point where their memory usage was starting to hit the address bus limit: they could only address 4GB of RAM. 2^32 = 4,29 billion bytes = 4GB. In order to enable the use of more RAM, the options were then either to resort to some burdensome extended memory solution like in the DOS days, or to expand the computers, including their address bus, to 64 bits.
Far and near pointers were used in old platforms like DOS.
I don't think they're relevant in modern platforms. But you can learn about them here and here (as pointed by other answers). Basically, a far pointer is a way to extend the addressable memory in a computer. I.E., address more than 64k of memory in a 16bit platform.
A pointer basically holds addresses. As we all know, Intel memory management is divided into 4 segments.
So when an address pointed to by a pointer is within the same segment, then it is a near pointer and therefore it requires only 2 bytes for offset.
On the other hand, when a pointer points to an address which is out of the segment (that means in another segment), then that pointer is a far pointer. It consist of 4 bytes: two for segment and two for offset.
Four registers are used to refer to four segments on the 16-bit x86 segmented memory architecture. DS (data segment), CS (code segment), SS (stack segment), and ES (extra segment). A logical address on this platform is written segment:offset, in hexadecimal.
Near pointers refer (as an offset) to the current segment.
Far pointers use segment info and an offset to point across segments. So, to use them, DS or CS must be changed to the specified value, the memory will be dereferenced and then the original value of DS/CS restored. Note that pointer arithmetic on them doesn't modify the segment portion of the pointer, so overflowing the offset will just wrap it around.
And then there are huge pointers, which are normalized to have the highest possible segment for a given address (contrary to far pointers).
On 32-bit and 64-bit architectures, memory models are using segments differently, or not at all.
Well in DOS it was kind of funny dealing with registers. And Segments. All about maximum counting capacities of RAM.
Today it is pretty much irrelevant. All you need to read is difference about virtual/user space and kernel.
Since win nt4 (when they stole ideas from *nix) microsoft programmers started to use what was called user/kernel memory spaces.
And avoided direct access to physical controllers since then. Since then dissapered a problem dealing with direct access to memory segments as well. - Everything became R/W through OS.
However if you insist on understanding and manipulating far/near pointers look at linux kernel source and how it works - you will newer come back I guess.
And if you still need to use CS (Code Segment)/DS (Data Segment) in DOS. Look at these:
https://en.wikipedia.org/wiki/Intel_Memory_Model
http://www.digitalmars.com/ctg/ctgMemoryModel.html
I would like to point out to perfect answer below.. from Lundin. I was too lazy to answer properly. Lundin gave very detailed and sensible explanation "thumbs up"!

Resources