Virtual Memory System - c

I have a virtual memory system that consists of:-
• 32-bit virtual address
• 4-kbyte virtual page size
• 32-bit Page Table Entry (PTE)
• 2-Gbyte physical memory
I have been asked to find the number of physical frames available in the system and the size (in bytes) of the page table.
I have found the answer to the amount of physical frames, which i think is
physical memory/virtual page size
2^31/2^12 = 2^19 = 524,288
Firstly i want to know if that is correct.
Secondly, i would like to calculate the size of the page table in bytes.
Thanks in advance.

LA(logical address) = 32 bits
=> LAS(Logical address space ) =232 bytes
PA(physical address) =30 bits
=> PAS(Physical address space ) =230 bytes
we know, page size ==frame size
No. of pages= (LAS/page size)= 232-12 =220= 1 M pages
No. of frames =(PAS/frame size) = 230-12 = 218 frames
Since no. of entries in page table is equal to number of pages in LAS.
Hence page table size = No. of entries * entry size
=> page table size= 220* 4 Bytes= 222 Bytes.

Related

Kernel sys_call_table address does not match address specified in system.map

I am trying to brush up on C so I have been playing around with the linux kernel's system call table (on 3.13.0-32-generic). I found a resource online that searches for the system call table with the following function which I load into the kernel in an LKM:
static uint64_t **aquire_sys_call_table(void)
{
uint64_t offset = PAGE_OFFSET;
uint64_t **sct;
while (offset < ULLONG_MAX) {
sct = (uint64_t **)offset;
if (sct[__NR_close] == (uint64_t *) sys_close) {
printk("\nsys_call_table found at address: 0x%p\n", sys_call_table);
return sct;
}
offset += sizeof(void *);
}
return NULL;
}
The function works. I am able to use the address it returns to manipulate the system call table. What I don't understand is why the address returned by this function doesn't match the address in /boot/System.map-(KERNEL)
Here is what the function prints:
sys_call_table found at address: 0xffff880001801400
Here is what I get when I search system.map
$ sudo cat /boot/System.map-3.13.0-32-generic | grep sys_call_table
ffffffff81801400 R sys_call_table
ffffffff81809cc0 R ia32_sys_call_table
Why don't the two addresses match? Its my understanding that the module runs in the kernel's address space, so the address of the system call table should be the same.
The two virtual addresses have the same physical address.
From Documentation/x86/x86_64/mm.txt
<previous description obsolete, deleted>
Virtual memory map with 4 level page tables:
0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
hole caused by [48:63] sign extension
ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor
ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
... unused hole ...
ffffec0000000000 - fffffc0000000000 (=44 bits) kasan shadow memory (16TB)
... unused hole ...
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
... unused hole ...
ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0
ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
The direct mapping covers all memory in the system up to the highest
memory address (this means in some cases it can also include PCI memory
holes).
vmalloc space is lazily synchronized into the different PML4 pages of
the processes using the page fault handler, with init_level4_pgt as
reference.
Current X86-64 implementations only support 40 bits of address space,
but we support up to 46 bits. This expands into MBZ space in the page tables.
->trampoline_pgd:
We map EFI runtime services in the aforementioned PGD in the virtual
range of 64Gb (arbitrarily set, can be raised if needed)
0xffffffef00000000 - 0xffffffff00000000
-Andi Kleen, Jul 2004
we know the virtual address space ffff880000000000-ffffc7ffffffffff is direct mapping of all physical memory. When the kernel wants to access all physical memory, it uses direct mapping. It's also what you use for searching.
And the ffffffff80000000-ffffffffa0000000 is kernel text mapping. When the kernel code executed, rip register uses the kernel text mapping.
In arch/x86/include/asm/page_64.h, we can get the relation of virtual address and physical address.
static inline unsigned long __phys_addr_nodebug(unsigned long x)
{
unsigned long y = x - __START_KERNEL_map;
/* use the carry flag to determine if x was < __START_KERNEL_map */
x = y + ((x > y) ? phys_base : (__START_KERNEL_map - PAGE_OFFSET));
return x;
}
and
// arch/x86/include/asm/page_types.h
#define PAGE_OFFSET ((unsigned long)__PAGE_OFFSET)
// arch/x86/include/asm/page_64_types.h
#define __START_KERNEL_map _AC(0xffffffff80000000, UL)
#define __PAGE_OFFSET _AC(0xffff880000000000, UL)
As for the addresses mentioned in the question above:
what the function prints,
sys_call_table found at address: 0xffff880001801400
what system.map gives,
$ sudo cat /boot/System.map-3.13.0-32-generic | grep sys_call_table
ffffffff81801400 R sys_call_table
ffffffff81809cc0 R ia32_sys_call_table
both of them resolve to same physical address.
virt->phys conversion happens in such way that corresponding addresses in 'direct' mapping region and 'kernel text' mapping region resolve to same physical address.
Through the magic of virtual memory mapping, the address you use depends on where you are. The symbol table file System.map is to help attaching a gdb or crash utility to the running system. Inside the kernel, well, is inside the kernel.
You may also have a /proc/kallsym file for even more values :)
Only root can show the addresses in the /proc/kallsyms file! It is rarely disabled but you can enable it if it's disabled. But the addresses in the System.map and kallsyms file for the same sys_call are different.
If a person is using a kernel built by himself, then System.map is preferable but if you are using a pre-built kernel (like we mostly do), then kallsyms is the right place for you!

Reading a FAT16 file system

I am trying to read a FAT16 file system to gain information about it like number of sectors, clusters, bytespersector etc...
I am trying to read it like this:
FILE *floppy;
unsigned char bootDisk[512];
floppy = fopen(name, "r");
fread(bootDisk, 1, 512, floppy);
int i;
for (i = 0; i < 80; i++){
printf("%u,",bootDisk[i]);
}
and it outputs this:
235,60,144,109,107,100,111,115,102,115,0,0,2,1,1,0,2,224,0,64,11,240,9,0,18,0,2,0,0,0,0,0,0,0,0,0,0,0,41,140,41,7,68,32,32,32,32,32,32,32,32,32,32,32,70,65,84,49,50,32,32,32,14,31,190,91,124,172,34,192,116,11,86,180,14,187,7,0,205,16,
What do these numbers represent and what type are they? Bytes?
You are not reading the values properly. Most of them are longer than 1 byte.
From the spec you can obtain the length and meaning of every attributes in the boot sector:
Offset Size (bytes) Description
0000h 3 Code to jump to the bootstrap code.
0003h 8 Oem ID - Name of the formatting OS
000Bh 2 Bytes per Sector
000Dh 1 Sectors per Cluster - Usual there is 512 bytes per sector.
000Eh 2 Reserved sectors from the start of the volume.
0010h 1 Number of FAT copies - Usual 2 copies are used to prevent data loss.
0011h 2 Number of possible root entries - 512 entries are recommended.
0013h 2 Small number of sectors - Used when volume size is less than 32 Mb.
0015h 1 Media Descriptor
0016h 2 Sectors per FAT
0018h 2 Sectors per Track
001Ah 2 Number of Heads
001Ch 4 Hidden Sectors
0020h 4 Large number of sectors - Used when volume size is greater than 32 Mb.
0024h 1 Drive Number - Used by some bootstrap code, fx. MS-DOS.
0025h 1 Reserved - Is used by Windows NT to decide if it shall check disk integrity.
0026h 1 Extended Boot Signature - Indicates that the next three fields are available.
0027h 4 Volume Serial Number
002Bh 11 Volume Label - Should be the same as in the root directory.
0036h 8 File System Type - The string should be 'FAT16 '
003Eh 448 Bootstrap code - May schrink in the future.
01FEh 2 Boot sector signature - This is the AA55h signature
You should probably use a custom struct to read the boot sector.
Like:
typedef struct {
unsigned char jmp[3];
char oem[8];
unsigned short sector_size;
unsigned char sectors_per_cluster;
unsigned short reserved_sectors;
unsigned char number_of_fats;
unsigned short root_dir_entries;
[...]
} my_boot_sector;
Keep in mind your endianness and padding rules in your implementation. This struct is an example only.
If you need more details this is a thorough example.

How does Linux allocate memory for its physical allocator?

I was recently delving into the details of Linux's memory management as I want to implement something similar for my own toy kernel, so I was hoping if someone who's familiar with the details could help me understand one thing. Apparently the physical memory manager is a buddy algorithm, which is further specialised to return blocks of pages of a particular order (0 to 9, with 0 being just a single page). For each order the blocks are stored as a linked list. Say if a block of order 5 is requested but is not found on the list of order 5 blocks, the algorithm searches for a block in order 6, splits it into two, gives the requested half and moves the other half an order lower (as it is half in size).
What I don't get is how the kernel stores these structures, or how it allocates space for them. Since for order 0 pages you would need 1M entries (each is a 4KiB page), does it mean that the kernel allocates 1MiB * sizeof(struct page)? What about the blocks of order 1 and above? Does the kernel reuse allocated blocks by marking them as a higher order, and when it needs to split it in two just return the block and get one that is unused?
What I don't get is how the kernel stores these structures, or how it allocates space for them. Since for order 0 pages you would need 1M entries (each is a 4KiB page), does it mean that the kernel allocates 1MiB * sizeof(struct page)?
Initialization of zones is done by calling paging_init() (arch/x86/mm/init_32.c; some descriptions - https://www.kernel.org/doc/gorman/html/understand/understand005.html 2.3 Zone Initialisation and http://repo.hackerzvoice.net/depot_madchat/ebooks/Mem_virtuelle/linux-mm/vminit.html Initializing the Kernel Page Tables) from setup_arch() via (native_pagetable_init() and indirect call 1166 x86_init.paging.pagetable_init();):
690 /*
691 * paging_init() sets up the page tables - note that the first 8MB are
692 * already mapped by head.S.
...*/
697 void __init paging_init(void)
698 {
699 pagetable_init();
...
711 zone_sizes_init();
712 }
pagetable_init() creates kernel page tables in swapper_pg_dir array of 1024 pgd_ts.
zone_sizes_init() actually defines zones of physical memory and calls free_area_init_nodes() to initialize them with actual work done (for each NUMA node for_each_online_node(nid) {...}) in free_area_init_node() which calls three functions:
calculate_node_totalpages() prints page counts for every node in dmesg
alloc_node_mem_map() does actual job of allocating struct page for every physical page in this node; memory for them is allocated by bootmem allocator doc1 doc2 (you can see its debug with bootmem_debug=1 kernel boot option):
4936 size = (end - start) * sizeof(struct page);
4937 map = alloc_remap(pgdat->node_id, size);
if (!map) map = memblock_virt_alloc_node_nopanic(size, pgdat->node_id);
free_area_init_core() (with filling of bitmaps in struct zone). Functionality of free_area_init_core described for older kernels in http://repo.hackerzvoice.net/depot_madchat/ebooks/Mem_virtuelle/linux-mm/zonealloc.html#INITIALIZE as:
free_area_init_core() The memory map is built, and the freelists and buddy bitmaps initialized, in free_area_init_core().
Free lists of orders in each zone are initialized and orders are marked as having no any free page: free_area_init_core() -> init_currently_empty_zone() -> zone_init_free_lists:
4147 static void __meminit zone_init_free_lists(struct zone *zone)
4148 {
4149 unsigned int order, t;
4150 for_each_migratetype_order(order, t) {
4151 INIT_LIST_HEAD(&zone->free_area[order].free_list[t]);
4152 zone->free_area[order].nr_free = 0;
4153 }
4154 }
PS: There is init() in kernel, it is called start_kernel(), and LXR (Linux cross-reference) will help you to navigate between functions (I posted links to lxr.free-electrons.com, but there are several online LXRs):
501 asmlinkage __visible void __init start_kernel(void)
...
528 boot_cpu_init();
529 page_address_init();
530 pr_notice("%s", linux_banner);
531 setup_arch(&command_line);

LBA and cluster

I wonder about LBA and cluster number.
My question is this:
is LBA 0 always cluster 2?
then what does cluster 0 and 1 for?
only difference between cluster and LBA is just where do they start from the disk?
relation among CHS, LBA, cluster nubmer?
and in the flowing code, what does add ax, WORD [datasector] code for?
;************************************************;
; Convert CHS to LBA
; LBA = (cluster - 2) * sectors per cluster
;************************************************;
ClusterLBA:
sub ax, 0x0002 ; zero base cluster number
xor cx, cx
mov cl, BYTE [bpbSectorsPerCluster] ; convert byte to word
mul cx
add ax, WORD [datasector] ; base data sector
ret
There are many sector numbering schemes on disk drives. One of the earliest was CHS (Cylinder-Head-Sector). One sector can be selected by specifying the cylinder (track), read/write head and sector per track triplet. This numbering scheme depends on the actual physical characteristics of the disk drive.
The first logical sector resides on cylinder 0, head 0, sector 1. The second is on sector 2, and so on. If there isn't any more sectors on the disk (eg. on a 1.44M floppy disk there's 18 sectors per track), then the next head is applied, starting on sector 1 again, and so on.
You can convert CHS addresses to an absolute (or logical) sector number with a little math:
L = (C * Nh + H) * Ns + S - 1
where C, H ans S are the cylinder, head and sector numbers according to CHS adressing, while Nh and Ns are the number of heads and number of sectors per track (cylinder), respectively. The reverse calculation (to convert LBA to CHS) is as simple as this.
In this numbering scheme, which is called LBA (Logical Block Addressing), each sector can be identified by a single number. The first logical sector is LBA 0, the second is LBA 1, and so on. This scheme is linear and easier to deal with.
Clusters are simply groups of continuous sectors on the disk, which are treated together by the operating system and the file system, in order to reduce disk fragmentation and disk space needed for file system metadata (eg. to describe in which sectors could a specific file found on the disk). A cluster may consist of only 1 sector (512 bytes), up to 128 sectors (64 kilobytes) or more, depending on the capacity of the disk.
Again, the logical sector number of the first sector of a cluster can be easily calculated:
L = ((U - Sc) * Nc) + Sd
where U is the cluster number, Nc is the number of sectors in a cluster, Sc is the first valid cluster number, and Sd is the number of the first logical sector available for generic file data. The latter two parameters (Sc and Sd) are completely filesystem and operating system specific values.
Some filesystems (for example FAT16, and the whole FAT-family) reserve cluster number 0 and 1 for special purposes, that's why the first actual cluster is cluster number two (Sc = 2 in this case). Similarly, there may be some reserved sectors in the beginning of the disk, where no data is allowed to be written to and read from. This reserved area can range from a few sectors (e.g. a boot record) to millions of sectors (e.g. a completely different partition which preceeds our partition on the hard disk).
Huh, this was the long answer. After all, the short answers to your questions can be summarized as follows:
No, LBA 0 is not always cluster 2, it's filesystem specific (in case of FAT, cluster 2 is the first available sector on the disk, but not always LBA 0 - see answer 5).
Interpretation of cluster number 0 and 1 are also filesystem specific (in case of FAT, cluster number 0 represents an empty cluster in the File Allocation Table, and cluster number 1 is reserved).
No, the main difference is that a cluster number addresses a group of continous sectors, while LBA addresses a single sector on the disk.
See the formulas (formulae?), and the accompanying description in the long answer above.
It's hard to tell from such a short assembly code, but my best guess would be the number of reserved sectors in the beginning of the partition (noted by Sd in the formula above).

Programatically determining file "size on disk" in advance

I need to know how big a given in-memory buffer will be as an on-disk (usb stick) file before I write it. I know that unless the size falls on the block size boundary, its likely to get rounded up, e.g. a 1 byte file takes up 4096 bytes on-disk. I'm currently doing this using GetDiskFreeSpace() to work out the disk block size, then using this to calculate the on-disk size like this:
GetDiskFreeSpace(szDrive, &dwSectorsPerCluster,
&dwBytesPerSector, NULL, NULL);
dwBlockSize = dwSectorsPerCuster * dwBytesPerSector;
if (dwInMemorySize % dwBlockSize != 0)
{
dwSizeOnDisk = ((dwInMemorySize / dwBlockSize) * dwBlockSize) + dwBlockSize;
}
else
{
dwSizeOnDisk = dwInMemorySize;
}
Which seems to work fine, BUT GetDiskFreeSpace() only works on disks up to 2GB according to MSDN. GetDiskFreeSpaceEx() doesn't return the same information, so my question is, how else can I calculate this information for drives >2GB? Is there an API call I've missed? Can I assume some hard values depending on the overall disk size?
MSDN only states that the GetDiskFreeSpace() function cannot report volume sizes greater than 2GB. It works fine for retrieving sectors per cluster and bytes per sector, I've used it myself for very similar-looking code ;-)
But if you want disk capacity too, you'll need an additional call to GetDiskFreeSpaceEx().
The size of a file on disk is a fuzzy concept. In NTFS, a file consists of a set of data elements. You're primarilty thinking of the "unnamed data stream". That's an attribute of a file that, if small, can be packed with the other attributes in the directory entry. Apparently, you can store a data stream of up to 700-800 bytes in the directory entry itself. Hence, your hypothetical 1 byte file would be as big as a 0 byte or 700 byte file.
Another influence is file compression. This will make the on-disk size potentially smaller than the in-memory size.
You should be able to obtain this information using the DeviceIoControl function and
DISK_GEOMETRY_EX. It will return a structure that contains the information you are looking for I think
http://msdn.microsoft.com/en-us/library/aa363216(VS.85).aspx
http://msdn.microsoft.com/en-us/library/ms809010.aspx
In actionscript!
var size:Number = 19912;
var sizeOnDisk:Number = size;
var reminder:Number = size % (1024 * 4);
if(reminder>0){
sizeOnDisk = size + ((1024 * 4)- reminder)
}
trace(size)
trace(sizeOnDisk)

Resources