embedded linux ARM booting address - arm

I follow some document to boot embedded Linux on ARM board (ex: Freescale Vybrid tower) via sdcard. in the document there are steps to build uImage and write u-boot into sdcard as below:
sudo dd if=u-boot.imx of=/dev/sdX bs=512 seek=2
mkimage -A arm64 -O linux -T kernel -C none -a 0x81000000 -e 0x81000000 -n
“Linux” -d Image uImage
What I would like to know is from which datasheet/UM/RM or any document they get the number: bs=512 seek=2, -a 0x81000000 (Load address), -e 0x81000000 (Entry point)
Please also explain what Load address/entry point address mean?

What I would like to know is from which datasheet/UM/RM or any document they get the number: bs=512 seek=2, -a 0x81000000 (Load address), -e 0x81000000 (Entry point)
The bs=512 seek=2 specification should be from the NXP/Freescale reference manual for the SoC (e.g. the "Expansion Device: SD, eSD and SDXC" section of the System Boot chapter).
When configured to boot from an SDcard, the ROM boot program (of the SoC) will look for a program image (e.g. U-Boot) at byte offset 0x400 (or 2 * 512 = 1024), which is the third 512-byte sector.
The first sector is presumed to be the MBR, and the second sector is reserved for an optional Secondary Image Table (using terminology from NXP document).
Allwinner SoCs use a similar booting scheme for SDcard (i.e. the U-Boot image is at a fixed location in raw sectors not part of a partition), but the image starts at the 17th sector.
Instead of loading raw sectors, some SoCs (e.g. Atmel) boot from SDcard by loading a file from a FAT partition.
Please also explain what Load address/entry point address mean?
These values are specified to the mkimage utility so that they can be installed in the uImage header. U-Boot will then use these values when the uImage is loaded and unpacked.
The load address specifies to U-Boot the required memory address to locate the image. The image is copied to that memory address.
The entry point specifies to U-Boot the memory address to jump/branch to in order to execute the image. This value is typically the same address as the load address.
For an ARM Linux kernel the recommended load and entry-point addresses are 0x8000 from the start of physical memory, according to (Vincent Sanders') Booting ARM Linux.
See Building kernel uImage using LOADADDR for more details.

Please also explain what Load address/entry point address mean?
Load address : Refers to from where the kernel is loaded. This is the kernel "load address". U-Boot shall copy the image to that region of memory. The address is dependent on the board design/architecture. In general design, this shall refer to RAM address. You need to check your board specification.
Entry point : This is where the control/execution is transferred once the image is written into RAM. (The code at this location shall will be executed first when the kernel in RAM is invoked by bootloader.)

What I would like to know is from which datasheet/UM/RM or any document they get the number: bs=512 seek=2, -a 0x81000000 (Load address), -e 0x81000000 (Entry point)
Please also explain what Load address/entry point address mean?
The bs=512 seek=2 is to skip the first sector of the SD card. This has some sort of boot information (MBR - master boot record or partition table are similar concepts) and you will brick the card if you overwrite this information (or at least need other tools to fix it). It is defined in an MMC/SD card standard. I think the JEDEC web sight has it.
The load address is where to move the SD card image to memory (Ie SDRAM). The entry point is where to hand control once the image is loaded. Often they are the same if the boot code is written in assembler and a linker is used. However, sometimes a hard coded vector table is at the start of the image and the entry point is somewhere in the middle. In any case, both are physical addresses. It could be 'IRAM' (internal static ram) for the case of a smaller kernel but must be SDRAM for Linux (which requires your SDRAM to be working). You may have issue with this if it is a custom board and not an off the shelf Vybrid Tower. Also, there are different Tower board revisions and they work differently. Check the errata on them. Finally, different U-boot versions support different boot modes. Ie, where is u-boot stored and executed from? The address are in the Vybrid TRM in the physical memory map for the Cortex-A5 CPU.

Related

U-Boot error while decompressing gzipped kernel

I have an older embedded device (PHYTEC phyCORE-LPC3250) that runs the ancient U-Boot 1.3.3.
The Linux kernel uImage gets copied to the NAND flash at 0x200000, then booted with: nboot 80100000 0 200000;bootm
This works fine if the uImage is derived from the self-expanding zImage, but according to this mailing list post it is preferable to have U-Boot perform the decompression itself.
So I have tried creating a uImage that contains a gzipped version of the normal kernel image, but decompression fails:
Image Name: Poky (Yocto Project Reference Di
Image Type: ARM Linux Kernel Image (gzip compressed)
Data Size: 4057248 Bytes = 3.9 MB
Load Address: 80008000
Entry Point: 80008000
Verifying Checksum ... OK
Uncompressing Kernel Image ... Error: inflate() returned -3
GUNZIP: uncompress or overwrite error - must RESET board to recover
This scenario is described in the FAQ, which suggests that the problem is running out of RAM. But I have 128 MB of RAM, starting at 0x80000000, and the uncompressed kernel is only 8 MB.
(I validated that the data in the uImage is in fact gzipped.)
First U-Boot copies the uImage from NAND to RAM at the specified address 0x80100000. Then it unpacks the kernel to the load address specified in the uImage header, 0x80008000.
Since our kernel is about 8 MB uncompressed, that means the kernel's memory from 0x80008000 to approximately 0x80800000 overlaps where we copied the uImage at 0x80100000.
If the uImage is not compressed, unpacking the kernel can use memmove which handles overlapping address ranges without issue. (If the load address is the address where we copied it in RAM, the kernel gets executed in-place.)
But for compressed uImages, if we overwrite the compressed data while decompressing, decompression will obviously fail.

ARM LPC3250 execute instructions from external RAM

I got trouble with making ARM execute instructions stored in the external RAM.
I wrote a small program that can blink led based on LPC3250 architecture. The program run properly if I download it to the internal RAM of LPC3250 via IAR online-debug. But it cannot run if I put it in the external RAM.
The external RAM is a block of SRAM built in the SPARTAN-6(Xilinx FPGA), DATA width is 32-bits, memory depth is 4096, means address width is 12-bits. This RAM can be initialized through a COE file.
So I get the BIN file of the Program via IAR, then convert the BIN file into a COE file, which will be used to initialize the SRAM. But every time, the processor just execute the three E59FF018(LDR PC,(PC, #0X18)) instructions at the beginning of SRAM, cannot jump to the main().
I cannot figure out why. As LPC3250 required, I add 4 bytes(0x13579BD2) to the beginning of BIN file via UltraEditor before generating the COE file. The UM of LPC3250 said it will start to execute the code at the address 0xE0000004 of external ram if the value in 0xE0000000 is 0x13579BD2. From the COE file I can see there are 5 same instructions(E59FF018) behind 0x13579BD2.
Please tell me where I'm wrong and What I need to do exactly to make it right.
Well, I almost forget that I have asked this question 3 years ago. Now, I have found the cause. It's about the Address signal output from the ARM. I misunderstand that the Address signal from the ARM is Byte addressing, actually it's Double words addressing. So I shouldn't dismiss the low two bits of the Address signal when processing in the FPGA, it has been dismissed already by the ARM.
In a word, my problem is an addressing problem.

What does RAM_HIGH_ADRS really means in a VxWorks BSP?

RAM_HIGH_ADRS is a parameter defined in config.h and in the makefile. As I understand it, it defines the adress on which the program's data+text+bss segments will be written in the RAM.
Which means, for example, that if the cpu has 64 Mb of RAM, and RAM_HIGH_ADRS equals to 0x00A00000 (10 Mb), the entire program has 54 Mb to work with in terms of storing text+data+bss+heap+stack.
The reason I'm questioning this is I am working on a project where I expanded the data segment by a large margin which caused the cpu to not boot. I then increased RAM_HIGH_ADRS, which at this point allowed the cpu to boot. This confuses me since the only thing that is written between RAM_LOW_ADRS and RAM_HIGH_ADRS, to my understanding, is the VxWorks image, so increasing the RAM_HIGH_ADRS should only lower the available size for the data segment.
If you are using Vxworks bootrom to boot the board, then here is how it works.
Bootrom gets placed at RAM_HIGH_ADRS. Bootrom then loads the VxWorks Kernel image from network (or any other place based on from you are fetching the vxWorks Kernel image), and place it in RAM starting from RAM_LOW_ADRS.
First it places .text segment and then right after that it places .rodata, .data, and .bss. Therefore there has be enough space between RAM_LOW_ADRS and RAM_HIGH_ADRS that can accommodate .text+.rodata_.data+.bss.
If space is not enough then user will see the symptom that you have seen. In such case set RAM_HIGH_ADRS to some higher value so that .text+.rodata_.data+.bss can fit between the RAM_LOW_ADRS and RAM_HIGH_ADRS.
from vxworks-bsps-6.7.pdf page 6:
High RAM address. When the bootrom is used, the boot loader places the
small VxWorks kernel (the bootrom) at high RAM. The
RAM_LOW_ADRS..RAM_HIGH_ADRS is used by the bootrom kernel to store the
VxWorks kernel fetched from the network before booting. Usually set to
half main memory + 0x3000, for example 0x40203000 on a system with 4Mb
RAM.

Does two different programs load shared library function at same physical memory location

I am using OpenSSL shared library to do simple encryption using AES_cbc_encrypt() function. I want to know if I use this AES_cbc_encrypt() function from two different program, will both program point to the same location in Physical memory for this AES_cbc_encrypt() function?
My other questions are
1 > If I use shared library will it be automatically pointed to same physical memory location by all programs where it is being used ?
Or
2 > Do I need to follow some other technique to force the programs to load the shared library at the same physical memory in RAM. ( I don't think so it is true then there is no use of shared memory concept. It's my understanding).
3 >
How to check whether both program load the shared library function at same physical location.
4> I calculate the location (virtual address) of function in both program by using (& AES_cbc_encrypt) , then using tool capture, I convert this virtual address (VPN) to Physical address (PFN). But, I don't know how to calculate physical address from this VPN, PFN info. So not able to compare further . Any clue ?
For example my virtual address is
=0x400cb0
Virtual address
Starting address- end address
00400000-00402000
Physical Page
: A600000000036E26
: A60000000008A4C3
In my system
**Virtual address space : 48 bit
Physical address space : 36 bit**
I am using GCC under Linux. Any help or pointer/link will be highly appreciated. Thanks in advance.
Read Drepper's paper How To Write Shared Libraries.
Shared libraries use position independent code (to minimize relocation). They are mmap(2)-ed by the dynamic linker ld-linux(8). Linux processes have their address space in virtual memory managed by the linux kernel thru paging.
The kernel will generally share read segments (e.g. the text segment) of shared libraries (so their pages use indeed the same RAM for different processes).
You could use /proc/self/maps (or /proc/1234/maps for the process of pid 1234) to find out the memory mapping of a process. See proc(5).
You should not care about (and application don't directly see) RAM pages. Only the kernel manage physical RAM (and it can move pages in the RAM, page out them to disk, etc.) thru the MMU.
See also mincore(2) & mlock(2). Read also about OOM & thrashing & swap space.
Read Advanced Linux Programming !
While I compile with option -fPIC, I got the same virtual addresses(may be coincidentally) as well as Physical address same for whole library from both program.
gcc -fPIC -o aes openssl_aes.c -lcrypto
This proves that shared library is loaded into same physical location.

Does Virtual Memory area struct only comes into picture when there is a page fault?

Virtual Memory is a quite complex topic for me. I am trying to understand it. Here is my understanding for a 32-bit system. Example RAM is just 2GB. I have tried reading many links, and I am not confident at the moment. I would like you people to help me in clearing up my concepts. Please acknowledge my points, and also please answer for what you feel is wrong. I have also a confused section in my points. So, here starts the summary.
Every process thinks it is only running. It can access the 4GB of memory - virtual address space.
When a process access a virtual address it is translated to physical address via MMU.
This MMU is a part of a CPU - a hardware.
When the MMU cannot translate the address to a physical one, it raises a page fault.
On page fault, the kernel is notified. The kernel check the VM area struct. If it can find it - may be on disk. It will do some page-in /page-out. And get this memory on the RAM.
Now MMU will again try and will succeed this time.
In case the kernel cannot find the address, it will raise a signal. For example, invalid access will raise a SIGSEGV.
Confused points.
Does Page table is maintained in Kernel? This VM area struct has a page table ?
How MMU cannot find the address in physical RAM. Let's say it translates to some wrong address in RAM. Still the code will execute, but it will be a bad address. How MMU ensures that it is reading a right data? Does it consult Kernel VM area everytime?
Is the Mapping table - virtual to physical is inside a MMU. I have read it that is maintained by an individual process. If it is inside a process, why I can't see it.
Or if it is MMU, how MMU generates the address - is it that Segment + 12-bit shift -> Page frame number, and then the addition of offset (bits -1 to 10) -> gives a physical address.
Does it mean that for a 32-bit architecture, with this calculation in my mind. I can determine the physical address from a virtual address.
cat /proc/pid_value/maps. This shows me the current mapping of the vmarea. Basically, it reads the Vmarea struct and prints it. That means that this is important. I am not able to fit this piece in the complete picture. When the program is executed does the vmarea struct is generated. Is VMAREA comes only into the picture when the MMU cannnot translate the address i.e. Page fault? When I print the vmarea it displays the address range , permission and mapped to file descriptor, and offset. I am sure this file descriptor is the one in the hard-disk and the offset is for that file.
The high-mem concept is that kernel cannot directly access the Memory region greater than 1 GB(approx). Thus, it needs a page table to indirectly map it. Thus, it will temporarily load some page table to map the address. Does HIGH MEM will come into the picture everytime. Because Userspace can directly translate the address via MMU. On what scenario, does kernel really want to access the High MEM. I believe the kernel drivers will mostly be using kmalloc. This is a direct memory + offset address. In this case no mapping is really required. So, the question is on what scenario a kernel needs to access the High Mem.
Does the processor specifically comes with the MMU support. Those who doesn't have MMU support cannot run LInux?
Does Page table is maintained in Kernel? This VM area struct has a page table ?
Yes. Not exactly: each process has a mm_struct, which contains a list of vm_area_struct's (which represent abstract, processor-independent memory regions, aka mappings), and a field called pgd, which is a pointer to the processor-specific page table (which contains the current state of each page: valid, readable, writable, dirty, ...).
The page table doesn't need to be complete, the OS can generate each part of it from the VMAs.
How MMU cannot find the address in physical RAM. Let's say it translates to some wrong address in RAM. Still the code will execute, but it will be a bad address. How MMU ensures that it is reading a right data? Does it consult Kernel VM area everytime?
The translation fails, e.g. because the page was marked as invalid, or a write access was attempted against a readonly page.
Is the Mapping table - virtual to physical is inside a MMU. I have read it that is maintained by an individual process. If it is inside a process, why I can't see it.
Or if it is MMU, how MMU generates the address - is it that Segment + 12-bit shift -> Page frame number, and then the addition of offset (bits -1 to 10) -> gives a physical address.
Does it mean that for a 32-bit architecture, with this calculation in my mind. I can determine the physical address from a virtual address.
There are two kinds of MMUs in common use. One of them only has a TLB (Translation Lookaside Buffer), which is a cache of the page table. When the TLB doesn't have a translation for an attempted access, a TLB miss is generated, the OS does a page table walk, and puts the translation in the TLB.
The other kind of MMU does the page table walk in hardware.
In any case, the OS maintains a page table per process, this maps Virtual Page Numbers to Physical Frame Numbers. This mapping can change at any moment, when a page is paged-in, the physical frame it is mapped to depends on the availability of free memory.
cat /proc/pid_value/maps. This shows me the current mapping of the vmarea. Basically, it reads the Vmarea struct and prints it. That means that this is important. I am not able to fit this piece in the complete picture. When the program is executed does the vmarea struct is generated. Is VMAREA comes only into the picture when the MMU cannnot translate the address i.e. Page fault? When I print the vmarea it displays the address range , permission and mapped to file descriptor, and offset. I am sure this file descriptor is the one in the hard-disk and the offset is for that file.
To a first approximation, yes. Beyond that, there are many reasons why the kernel may decide to fiddle with a process' memory, e.g: if there is memory pressure it may decide to page out some rarely used pages from some random process. User space can also manipulate the mappings via mmap(), execve() and other system calls.
The high-mem concept is that kernel cannot directly access the Memory region greater than 1 GB(approx). Thus, it needs a page table to indirectly map it. Thus, it will temporarily load some page table to map the address. Does HIGH MEM will come into the picture everytime. Because Userspace can directly translate the address via MMU. On what scenario, does kernel really want to access the High MEM. I believe the kernel drivers will mostly be using kmalloc. This is a direct memory + offset address. In this case no mapping is really required. So, the question is on what scenario a kernel needs to access the High Mem.
Totally unrelated to the other questions. In summary, high memory is a hack to be able to access lots of memory in a limited address space computer.
Basically, the kernel has a limited address space reserved to it (on x86, a typical user/kernel split is 3Gb/1Gb [processes can run in user space or kernel space. A process runs in kernel space when a syscall is invoked. To avoid having to switch the page table on every context-switch, on x86 typically the address space is split between user-space and kernel-space]). So the kernel can directly access up to ~1Gb of memory. To access more physical memory, there is some indirection involved, which is what high memory is all about.
Does the processor specifically comes with the MMU support. Those who doesn't have MMU support cannot run Linux?
Laptop/desktop processors come with an MMU. x86 supports paging since the 386.
Linux, specially the variant called µCLinux, supports processors without MMUs (!MMU). Many embedded systems (ADSL routers, ...) use processors without an MMU. There are some important restrictions, among them:
Some syscalls don't work at all: e.g fork().
Some syscalls work with restrictions and non-POSIX conforming behavior: e.g mmap()
The executable file format is different: e.g bFLT or ELF-FDPIC instead of ELF.
The stack cannot grow, and its size has to be set at link-time.
When a program is loaded first the kernel will setup a kernel VM-Area for that process is it? This Kernel VM Area actually holds where the program sections are there in the memory/HDD. Then the entire story of updating CR3 register, and page walkthrough or TLB comes into the picture right? So, whenever there is a pagefault - Kernel will update the page table by looking at Kernel virtual memory area is it? But they say Kernel VM area keeps updating. How this is possible, since cat /proc/pid_value/map will keep updating.The map won't be constant from start to end. SO, the real information is available in the Kernel VM area struct is it? This is the acutal information where the section of program lies, it could be HDD or physical memory -- RAM? So, this is filled during process loading is it, the first job? Kernel does the page in page out on page fault, and will update the Kernel VM area is it? So, it should also know the entire program location on the HDD for page-in / page out right? Please correct me here. This is in continuation to my first question of the previous comment.
When the kernel loads a program, it will setup several VMAs (mappings), according to the segments in the executable file (which on ELF files you can see with readelf --segments), which will be text/code segment, data segment, etc... During the lifetime of the program, additional mappings may be created by the dynamic/runtime linkers, by the memory allocator (malloc(), which may also extend the data segment via brk()), or directly by the program via mmap(),shm_open(), etc..
The VMAs contain the necessary information to generate the page table, e.g. they tell whether that memory is backed by a file or by swap (anonymous memory). So, yes, the kernel will update the page table by looking at the VMAs. The kernel will page in memory in response to page faults, and will page out memory in response to memory pressure.
Using x86 no PAE as an example:
On x86 with no PAE, a linear address can be split into 3 parts: the top 10 bits point to an entry in the page directory, the middle 10 bits point to an entry in the page table pointed to by the aforementioned page directory entry. The page table entry may contain a valid physical frame number: the top 22 bits of a physical address. The bottom 12 bits of the virtual address is an offset into the page that goes untranslated into the physical address.
Each time the kernel schedules a different process, the CR3 register is written to with a pointer to the page directory for the current process. Then, each time a memory access is made, the MMU tries to look for a translation cached in the TLB, if it doesn't find one, it looks for one doing a page table walk starting from CR3. If it still doesn't find one, a GPF fault is raised, the CPU switches to Ring 0 (kernel mode), and the kernel tries to find one in the VMAs.
Also, I believe this reading from CR, page directory->page-table->Page frame number-memory address this all done by MMU. Am I correct?
On x86, yes, the MMU does the page table walk. On other systems (e.g: MIPS), the MMU is little more than the TLB, and on TLB miss exceptions the kernel does the page table walk by software.
Though this is not going to be the best answer, iw ould like to share my thoughts on confused points.
1. Does Page table is maintained...
Yes. kernel maintains the page tables. In fact it maintains nested page tables. And top of the page tables is stored in top_pmd. pmd i suppose it is page mapping directory. You can traverse through all the page tables using this structure.
2. How MMU cannot find the address in physical RAM.....
I am not sure i understood the question. But in case because of some problem, the instruction is faulted or out of its instruction area is being accessed, you generally get undefined instruction exception resulting in undefined exception abort. If you look at the crash dumps, you can see it in the kernel log.
3. Is the Mapping table - virtual to physical is inside a MMU...
Yes. MMU is SW+HW. HW is like TLB and all. The mapping tables are stored here. For instructions, that is for code section i always converted the physical-virtual address and always they matched. And almost all the times it matches for Data sections as well.
4. cat /proc/pid_value/maps. This shows me the current mapping of the vmarea....
This is more used for analyzing the virtual addresses of user space stacks. As you know virtually all the user space programs can have 4 GB of virtual address. So unlike kernel if i say 0xc0100234. You cannot directly go and point to the istruction. So you need this mapping and the virtual address to point the instruction based on the data you have.
5. The high-mem concept is that kernel cannot directly access the Memory...
High-mem corresponds to user space memory(some one correct me if i am wrong). When kernel wants to read some data from a address at user space you will be accessing the HIGHMEM.
6. Does the processor specifically comes with the MMU support. Those who doesn't have MMU support cannot run LInux?
MMU as i mentioned is HW + SW. So mostly it would be coming with the chipset. and the SW would be generally architecture dependent. You can disable MMU from kernel config and build. I have never tried it though. Mostly these days allthe chipsets have it. But small boards i think they disable MMU. I am not entirely sure though.
As all these are conceptual questions, i may be lacking some knowledge and be wrong at places. If so others please correct me.

Resources