What does RAM_HIGH_ADRS really means in a VxWorks BSP? - c

RAM_HIGH_ADRS is a parameter defined in config.h and in the makefile. As I understand it, it defines the adress on which the program's data+text+bss segments will be written in the RAM.
Which means, for example, that if the cpu has 64 Mb of RAM, and RAM_HIGH_ADRS equals to 0x00A00000 (10 Mb), the entire program has 54 Mb to work with in terms of storing text+data+bss+heap+stack.
The reason I'm questioning this is I am working on a project where I expanded the data segment by a large margin which caused the cpu to not boot. I then increased RAM_HIGH_ADRS, which at this point allowed the cpu to boot. This confuses me since the only thing that is written between RAM_LOW_ADRS and RAM_HIGH_ADRS, to my understanding, is the VxWorks image, so increasing the RAM_HIGH_ADRS should only lower the available size for the data segment.

If you are using Vxworks bootrom to boot the board, then here is how it works.
Bootrom gets placed at RAM_HIGH_ADRS. Bootrom then loads the VxWorks Kernel image from network (or any other place based on from you are fetching the vxWorks Kernel image), and place it in RAM starting from RAM_LOW_ADRS.
First it places .text segment and then right after that it places .rodata, .data, and .bss. Therefore there has be enough space between RAM_LOW_ADRS and RAM_HIGH_ADRS that can accommodate .text+.rodata_.data+.bss.
If space is not enough then user will see the symptom that you have seen. In such case set RAM_HIGH_ADRS to some higher value so that .text+.rodata_.data+.bss can fit between the RAM_LOW_ADRS and RAM_HIGH_ADRS.

from vxworks-bsps-6.7.pdf page 6:
High RAM address. When the bootrom is used, the boot loader places the
small VxWorks kernel (the bootrom) at high RAM. The
RAM_LOW_ADRS..RAM_HIGH_ADRS is used by the bootrom kernel to store the
VxWorks kernel fetched from the network before booting. Usually set to
half main memory + 0x3000, for example 0x40203000 on a system with 4Mb
RAM.

Related

How is a C program divided into pages and then put inside RAM frames according to this scenario?

Look at this scheme and then answer the following questions :
As you see there's a simple C program roughly converted into assembly instructions. For the sake of simplicity let's suppose each instruction is 3 Bytes long and let's suppose page and frame size is also 3 Bytes long.
Is it right this flow of procedures ?
Is the program really scattered into pages and then put in RAM frames like that ?
If so how is the system capable of associating specific pages to the specific segments they belong to ?
I read in an OS book that segmentation and paging can coexist. Is this scenario related to this scenario ?
Please answer all this questions and try also to clarify the confusion regarding the coexistence of segmentation and paging. Good reference material about this subject is also very appreciated, specifically some concrete real example with C program and not abstract stuff
Is it right this flow of procedures?
Is the program really scattered into pages and then put in RAM frames like that?
In principle: Yes.
Please note that the sections ("Code", "Data", "Stack" and "Heap") are only names of ranges in the memory. For the CPU, there is no "border" between "Data" and "Stack".
If so how is the system capable of associating specific pages ...
I'm simplifying a lot in the next lines:
Reading memory works the following way: The CPU sends some address to the memory (e.g. RAM) and it sends a signal that data shall be read. The memory then sends the data stored at this address back to the CPU.
Until the early 1990s, there were computers that used CPUs and so-called MMUs in two separate chips. This device was put between the CPU and the memory. When the CPU sends an address to the memory, the address (cpu_address) is actually sent to the MMU. The MMU now performs an operation that is similar to the following C code operation:
int page_table[];
new_address = (page_table[cpu_address / PAGE_SIZE] *
PAGE_SIZE) + (cpu_address % PAGE_SIZE);
... the MMU sends the address new_address to the memory.
The MMU is built in a way that the operating system can write to the page_table.
Let's say the instruction LOAD R1,[XXX1] is stored at the "virtual address" 1234. This means: The CPU thinks that the instruction is stored at address 1234. The CPU has executed the instruction STORE [XXX2],5 and wants to execute the next instruction. For this reason it must read the next instruction from the memory, so it sends the address 1234 to the memory.
However, the address is not received by the memory but by the MMU. The MMU now calculates: 1234/3 = 411, 1234%3 = 1. Now the MMU looks up the page table; let's say that entry #411 is 56. Then it calculates: 56 * 3 + 1 = 169. It sends the address 169 to the memory so the memory will not return the data stored in address 1234, but the data stored in address 169.
Modern desktop CPUs (but not small microcontrollers) have the CPU and the MMU in one chip.
You may ask: Why is that done?
Let's say a program is compiled in a way that it must be loaded to address 1230 and you have another program that must also be loaded to address 1230. Without MMU it would not be possible to load both programs at the same time.
With an MMU, the operating system may load one program to address 100 and the other one to address 200. The OS will then modify the data in the page_table in a way the CPU accesses the currently running program when accessing address 1230.
... to the specific segments they belong to?
When compiling the program, the linker decides at which address some item (variable or function) is found. In the simplest case, it simply places the first function to a fixed address (for example always 1230) and appends all other functions and variables.
So if the code of your program is 100 bytes long, the first variable is found at address 1230 + 100 = 1330.
The executable file (created by the linker) contains some information that says that the code must be loaded to address 1230.
When the operating system loads the program, it checks its length. Let's say your program is 120 bytes long. It calculates 120/3=40, so 40 pages are needed.
The OS searches for 40 pages of RAM which are not used. Then it modifies the page_table in a way that address 1230 actually accesses the first free page, 1233 accesses the second free page, 1236 accesses the third free page and so on.
Now the OS loads the program to virtual address 1230-1349 (addresses sent from the CPU to the MMU); because of the MMU, the data will be written to the free pages in RAM.
I read in an OS book that segmentation and paging can coexist. Is this scenario related to this scenario?
In this case, the word "segmentation" describes a special feature of 16- and 32-bit x86 CPUs. At least I don't know other CPUs supporting that feature. In modern x86 operating systems, this feature is no longer used.
Your example seems not to use segmentation (your drawing may be interpreted differently).
If you run a 16-bit Windows program on an old 32-bit Windows version (recent versions do not support 16-bit programs any more), paging and segmentation are used the same time.
specifically some concrete real example with C program and not abstract stuff
Unfortunately, a concrete real example would be even more difficult to understand because the page sizes are much larger than 3 bytes (4 kilobytes on x86 CPUs) ...

Does Virtual Memory area struct only comes into picture when there is a page fault?

Virtual Memory is a quite complex topic for me. I am trying to understand it. Here is my understanding for a 32-bit system. Example RAM is just 2GB. I have tried reading many links, and I am not confident at the moment. I would like you people to help me in clearing up my concepts. Please acknowledge my points, and also please answer for what you feel is wrong. I have also a confused section in my points. So, here starts the summary.
Every process thinks it is only running. It can access the 4GB of memory - virtual address space.
When a process access a virtual address it is translated to physical address via MMU.
This MMU is a part of a CPU - a hardware.
When the MMU cannot translate the address to a physical one, it raises a page fault.
On page fault, the kernel is notified. The kernel check the VM area struct. If it can find it - may be on disk. It will do some page-in /page-out. And get this memory on the RAM.
Now MMU will again try and will succeed this time.
In case the kernel cannot find the address, it will raise a signal. For example, invalid access will raise a SIGSEGV.
Confused points.
Does Page table is maintained in Kernel? This VM area struct has a page table ?
How MMU cannot find the address in physical RAM. Let's say it translates to some wrong address in RAM. Still the code will execute, but it will be a bad address. How MMU ensures that it is reading a right data? Does it consult Kernel VM area everytime?
Is the Mapping table - virtual to physical is inside a MMU. I have read it that is maintained by an individual process. If it is inside a process, why I can't see it.
Or if it is MMU, how MMU generates the address - is it that Segment + 12-bit shift -> Page frame number, and then the addition of offset (bits -1 to 10) -> gives a physical address.
Does it mean that for a 32-bit architecture, with this calculation in my mind. I can determine the physical address from a virtual address.
cat /proc/pid_value/maps. This shows me the current mapping of the vmarea. Basically, it reads the Vmarea struct and prints it. That means that this is important. I am not able to fit this piece in the complete picture. When the program is executed does the vmarea struct is generated. Is VMAREA comes only into the picture when the MMU cannnot translate the address i.e. Page fault? When I print the vmarea it displays the address range , permission and mapped to file descriptor, and offset. I am sure this file descriptor is the one in the hard-disk and the offset is for that file.
The high-mem concept is that kernel cannot directly access the Memory region greater than 1 GB(approx). Thus, it needs a page table to indirectly map it. Thus, it will temporarily load some page table to map the address. Does HIGH MEM will come into the picture everytime. Because Userspace can directly translate the address via MMU. On what scenario, does kernel really want to access the High MEM. I believe the kernel drivers will mostly be using kmalloc. This is a direct memory + offset address. In this case no mapping is really required. So, the question is on what scenario a kernel needs to access the High Mem.
Does the processor specifically comes with the MMU support. Those who doesn't have MMU support cannot run LInux?
Does Page table is maintained in Kernel? This VM area struct has a page table ?
Yes. Not exactly: each process has a mm_struct, which contains a list of vm_area_struct's (which represent abstract, processor-independent memory regions, aka mappings), and a field called pgd, which is a pointer to the processor-specific page table (which contains the current state of each page: valid, readable, writable, dirty, ...).
The page table doesn't need to be complete, the OS can generate each part of it from the VMAs.
How MMU cannot find the address in physical RAM. Let's say it translates to some wrong address in RAM. Still the code will execute, but it will be a bad address. How MMU ensures that it is reading a right data? Does it consult Kernel VM area everytime?
The translation fails, e.g. because the page was marked as invalid, or a write access was attempted against a readonly page.
Is the Mapping table - virtual to physical is inside a MMU. I have read it that is maintained by an individual process. If it is inside a process, why I can't see it.
Or if it is MMU, how MMU generates the address - is it that Segment + 12-bit shift -> Page frame number, and then the addition of offset (bits -1 to 10) -> gives a physical address.
Does it mean that for a 32-bit architecture, with this calculation in my mind. I can determine the physical address from a virtual address.
There are two kinds of MMUs in common use. One of them only has a TLB (Translation Lookaside Buffer), which is a cache of the page table. When the TLB doesn't have a translation for an attempted access, a TLB miss is generated, the OS does a page table walk, and puts the translation in the TLB.
The other kind of MMU does the page table walk in hardware.
In any case, the OS maintains a page table per process, this maps Virtual Page Numbers to Physical Frame Numbers. This mapping can change at any moment, when a page is paged-in, the physical frame it is mapped to depends on the availability of free memory.
cat /proc/pid_value/maps. This shows me the current mapping of the vmarea. Basically, it reads the Vmarea struct and prints it. That means that this is important. I am not able to fit this piece in the complete picture. When the program is executed does the vmarea struct is generated. Is VMAREA comes only into the picture when the MMU cannnot translate the address i.e. Page fault? When I print the vmarea it displays the address range , permission and mapped to file descriptor, and offset. I am sure this file descriptor is the one in the hard-disk and the offset is for that file.
To a first approximation, yes. Beyond that, there are many reasons why the kernel may decide to fiddle with a process' memory, e.g: if there is memory pressure it may decide to page out some rarely used pages from some random process. User space can also manipulate the mappings via mmap(), execve() and other system calls.
The high-mem concept is that kernel cannot directly access the Memory region greater than 1 GB(approx). Thus, it needs a page table to indirectly map it. Thus, it will temporarily load some page table to map the address. Does HIGH MEM will come into the picture everytime. Because Userspace can directly translate the address via MMU. On what scenario, does kernel really want to access the High MEM. I believe the kernel drivers will mostly be using kmalloc. This is a direct memory + offset address. In this case no mapping is really required. So, the question is on what scenario a kernel needs to access the High Mem.
Totally unrelated to the other questions. In summary, high memory is a hack to be able to access lots of memory in a limited address space computer.
Basically, the kernel has a limited address space reserved to it (on x86, a typical user/kernel split is 3Gb/1Gb [processes can run in user space or kernel space. A process runs in kernel space when a syscall is invoked. To avoid having to switch the page table on every context-switch, on x86 typically the address space is split between user-space and kernel-space]). So the kernel can directly access up to ~1Gb of memory. To access more physical memory, there is some indirection involved, which is what high memory is all about.
Does the processor specifically comes with the MMU support. Those who doesn't have MMU support cannot run Linux?
Laptop/desktop processors come with an MMU. x86 supports paging since the 386.
Linux, specially the variant called µCLinux, supports processors without MMUs (!MMU). Many embedded systems (ADSL routers, ...) use processors without an MMU. There are some important restrictions, among them:
Some syscalls don't work at all: e.g fork().
Some syscalls work with restrictions and non-POSIX conforming behavior: e.g mmap()
The executable file format is different: e.g bFLT or ELF-FDPIC instead of ELF.
The stack cannot grow, and its size has to be set at link-time.
When a program is loaded first the kernel will setup a kernel VM-Area for that process is it? This Kernel VM Area actually holds where the program sections are there in the memory/HDD. Then the entire story of updating CR3 register, and page walkthrough or TLB comes into the picture right? So, whenever there is a pagefault - Kernel will update the page table by looking at Kernel virtual memory area is it? But they say Kernel VM area keeps updating. How this is possible, since cat /proc/pid_value/map will keep updating.The map won't be constant from start to end. SO, the real information is available in the Kernel VM area struct is it? This is the acutal information where the section of program lies, it could be HDD or physical memory -- RAM? So, this is filled during process loading is it, the first job? Kernel does the page in page out on page fault, and will update the Kernel VM area is it? So, it should also know the entire program location on the HDD for page-in / page out right? Please correct me here. This is in continuation to my first question of the previous comment.
When the kernel loads a program, it will setup several VMAs (mappings), according to the segments in the executable file (which on ELF files you can see with readelf --segments), which will be text/code segment, data segment, etc... During the lifetime of the program, additional mappings may be created by the dynamic/runtime linkers, by the memory allocator (malloc(), which may also extend the data segment via brk()), or directly by the program via mmap(),shm_open(), etc..
The VMAs contain the necessary information to generate the page table, e.g. they tell whether that memory is backed by a file or by swap (anonymous memory). So, yes, the kernel will update the page table by looking at the VMAs. The kernel will page in memory in response to page faults, and will page out memory in response to memory pressure.
Using x86 no PAE as an example:
On x86 with no PAE, a linear address can be split into 3 parts: the top 10 bits point to an entry in the page directory, the middle 10 bits point to an entry in the page table pointed to by the aforementioned page directory entry. The page table entry may contain a valid physical frame number: the top 22 bits of a physical address. The bottom 12 bits of the virtual address is an offset into the page that goes untranslated into the physical address.
Each time the kernel schedules a different process, the CR3 register is written to with a pointer to the page directory for the current process. Then, each time a memory access is made, the MMU tries to look for a translation cached in the TLB, if it doesn't find one, it looks for one doing a page table walk starting from CR3. If it still doesn't find one, a GPF fault is raised, the CPU switches to Ring 0 (kernel mode), and the kernel tries to find one in the VMAs.
Also, I believe this reading from CR, page directory->page-table->Page frame number-memory address this all done by MMU. Am I correct?
On x86, yes, the MMU does the page table walk. On other systems (e.g: MIPS), the MMU is little more than the TLB, and on TLB miss exceptions the kernel does the page table walk by software.
Though this is not going to be the best answer, iw ould like to share my thoughts on confused points.
1. Does Page table is maintained...
Yes. kernel maintains the page tables. In fact it maintains nested page tables. And top of the page tables is stored in top_pmd. pmd i suppose it is page mapping directory. You can traverse through all the page tables using this structure.
2. How MMU cannot find the address in physical RAM.....
I am not sure i understood the question. But in case because of some problem, the instruction is faulted or out of its instruction area is being accessed, you generally get undefined instruction exception resulting in undefined exception abort. If you look at the crash dumps, you can see it in the kernel log.
3. Is the Mapping table - virtual to physical is inside a MMU...
Yes. MMU is SW+HW. HW is like TLB and all. The mapping tables are stored here. For instructions, that is for code section i always converted the physical-virtual address and always they matched. And almost all the times it matches for Data sections as well.
4. cat /proc/pid_value/maps. This shows me the current mapping of the vmarea....
This is more used for analyzing the virtual addresses of user space stacks. As you know virtually all the user space programs can have 4 GB of virtual address. So unlike kernel if i say 0xc0100234. You cannot directly go and point to the istruction. So you need this mapping and the virtual address to point the instruction based on the data you have.
5. The high-mem concept is that kernel cannot directly access the Memory...
High-mem corresponds to user space memory(some one correct me if i am wrong). When kernel wants to read some data from a address at user space you will be accessing the HIGHMEM.
6. Does the processor specifically comes with the MMU support. Those who doesn't have MMU support cannot run LInux?
MMU as i mentioned is HW + SW. So mostly it would be coming with the chipset. and the SW would be generally architecture dependent. You can disable MMU from kernel config and build. I have never tried it though. Mostly these days allthe chipsets have it. But small boards i think they disable MMU. I am not entirely sure though.
As all these are conceptual questions, i may be lacking some knowledge and be wrong at places. If so others please correct me.

example of address translation

I have doubt with respect to the address space.
I have thought that the RAM if 4 GB is split up into 2 halves for kernel space(1GB) and user space(3GB).
1] Does RAM also maintains stack,heap,code and data section as hard disk.
2] Won't the process running is not given a boundary where the stack, data, code and heap have to grow in RAM.
3] My thought was that the stack,heap,code and data segment all be in the consecutive address space given to the process at the time of process creation.
4] How does the CPU takes the correct address of the process to execute, as the processes are not contiguous in physical memory.
No, only the virtual memory address space is split in two. Physical memory, the RAM in the machine, contains an entirely random collection of blocks that map to virtual memory addresses. From both operating system pages and user program pages. Much like the image shows although it is a bit misleading about showing the OS pages at the bottom.
That mapping constantly changes, a page fault is the essential mechanism to get a virtual memory page mapped to RAM. Which is triggered when a program accesses a virtual memory page that isn't present in RAM yet. As needed, RAM pages may be unmapped to make room, their content is either discarded or written to the pagefile. Code is usually discardable, it can be read back from the executable file, data usually isn't.
Some pages in RAM are special, they contain code and data that's used by drivers. They are page-locked. Required when the driver handles device interrupts and the code/data used by the interrupt handler must be present in RAM to allow the interrupt to be handled, can't afford a page fault at such a critical time. The probable reason the image was drawn like that.

Significance of Reset Vector in Modern Processors

I am trying to understand how computer boots up in very detail.
I came across two things which made me more curious,
1. RAM is placed at the bottom of ROM, to avoid Memory Holes as in Z80 processor.
2. Reset Vector is used, which takes the processor to a memory location in ROM, whose contents point to the actual location (again ROM) from where processor would actually start executing instructions (POST instruction). Why so?
If you still can't understand me, this link will explain you briefly,
http://lateblt.tripod.com/bit68.txt
The processor logic is generally rigid and fixed, thus the term hardware. Software is something that can be changed, molded, etc. thus the term software.
The hardware needs to start some how, two basic methods,
1) an address, hardcoded in the logic, in the processors memory space is read and that value is an address to start executing code
2) an address, hardcoded in the logic, is where the processor starts executing code
When the processor itself is integrated with other hardware, anything can be mapped into any address space. You can put ram at address 0x1000 or 0x40000000 or both. You can map a peripheral to 0x1000 or 0x4000 or 0xF0000000 or all of the above. It is the choice of the system designers or a combination of the teams of engineers where things will go. One important factor is how the system will boot once reset is relesed. The booting of the processor is well known due to its architecture. The designers often choose two paths:
1) put a rom in the memory space that contains the reset vector or the entry point depending on the boot method of the processor (no matter what architecture there is a first address or first block of addresses that are read and their contents drive the booting of the processor). The software places code or a vector table or both in this rom so that the processor will boot and run.
2) put ram in the memory space, in such a way that some host can download a program into that ram, then release reset on the processor. The processor then follows its hardcoded boot procedure and the software is executed.
The first one is most common, the second is found in some peripherals, mice and network cards and things like that (Some of the firmware in /usr/lib/firmware/ is used for this for example).
The bottom line though is that the processor is usually designed with one boot method, a fixed method, so that all software written for that processor can conform to that one method and not have to keep changing. Also, the processor when designed doesnt know its target application so it needs a generic solution. The target application often defines the memory map, what is where in the processors memory space, and one of the tasks in that assignment is how that product will boot. From there the software is compiled and placed such that it conforms to the processors rules and the products hardware rules.
It completely varies by architecture. There are a few reasons why cores might want to do this though. Embedded cores (think along the lines of ARM and Microblaze) tend to be used within system-on-chip machines with a single address space. Such architectures can have multiple memories all over the place and tend to only dictate that the bottom area of memory (i.e. 0x00) contains the interrupt vectors. Then then allows the programmer to easily specify where to boot from. On Microblaze, you can attach memory wherever the hell you like in XPS.
In addition, it can be used to easily support bootloaders. These are typically used as a small program to do a bit of initialization, then fetch a larger program from a medium that can't be accessed simply (e.g. USB or Ethernet). In these cases, the bootloader typically copies itself to high memory, fetches below it and then jumps there. The reset vector simply allows the programmer to bypass the first step.

How many bytes the cache controller fetches a time from main memory to L2 cache?

I just read two articles over this topic which provide infomration inconsistent, so I want to know which one is correct. Perhaps both are correct, but under what context?
The first one states that we fetch a page size a time
The cache controller is always observing the memory positions being loaded and loading data from several memory positions after the memory position that has just been read.
To give you a real example, if the CPU loaded data stored in the address 1,000, the cache controller will load data from ”n“ addresses after the address 1,000. This number ”n“ is called page; if a given processor is working with 4 KB pages (which is a typical value), it will load data from 4,096 addresses below the current memory position being load (address 1,000 in our example). In following Figure, we illustrate this example.
The second one states that we fetch sizeof(cache line) + sizeof(prefetcher) a time
So we can summarize how the memory cache works as:
The CPU asks for instruction/data stored in address “a”.
Since the contents from address “a” aren’t inside the memory cache, the CPU has to fetch it
directly from RAM.
The cache controller loads a line (typically 64 bytes) starting at address “a” into the memory
cache. This is more data than the CPU requested, so if the program continues to run sequentially
(i.e. asks for address a+1) the next instruction/data the CPU will ask will be already loaded in the
memory cache.
A circuit called prefetcher loads more data located after this line, i.e. starts loading the contents
from address a+64 on into the cache. To give you a real example, Pentium 4 CPUs have a 256-byte
prefetcher, so it loads the next 256 bytes after the line already loaded into the cache.
Completely hardware implementation dependent. Some implementations load a single line from main memory at a time — and cache line sizes vary a lot between different processors. I've seen line sizes from 64 bytes all the way up to 256 bytes. Basically what the size of a "cache line" means is that when the CPU requests memory from main RAM, it does so n bytes at a time. So if n is 64 bytes, and you load a 4-byte integer at 0x1004, the MMU will actually send 64 bytes across the bus, all the addresses from 0x1000 to 0x1040. This entire chunk of data will be stored in the data cache as one line.
Some MMUs can fetch multiple cache lines across the bus per request -- so that making a request at address 0x1000 on a machine that has 64 byte caches actually loads four lines from 0x1000 to 0x1100. Some systems let you do this explicitly with special cache prefetch or DMA opcodes.
The article through your first link, however, is completely wrong. It confuses the size of an OS memory page with a hardware cache line. These are totally different concepts. The first is the minimum size of virtual address space the OS will allocate at once. The latter is a detail of how the CPU talks to main RAM.
They resemble each other only in the sense that when the OS runs low on physical memory, it will page some not-recently-used virtual memory to disk; then later on, when you use that memory again, the OS loads that whole page from disk back into physical RAM. This is analogous (but not related) to the way that the CPU loads bytes from RAM, which is why the author of "Hardware Secrets" was confused.
A good place to learn all about computer memory and why caches work the way they do is Ulrich Drepper's paper, What Every Programmer Should Know About Memory.

Resources