Consecutive mmap call never return contiguous address - c

The function page_allocate work. It does return address to mapped pages with the specified alignment. However consecutive call using 64k and 1024k are never contiguous. Why?
./mmap 0x00001000 //4k
./mmap 0x00002000 //8k
./mmap 0x00004000 //16k
./mmap 0x00008000 //32k
./mmap 0x00010000 //64k bad? Can't get contiguous range within consecutive allocation
./mmap 0x00020000 //128k
./mmap 0x00040000 //256k
./mmap 0x00080000 //512k
./mmap 0x00100000 //1024k bad?
./mmap 0x00200000 //2048k
./mmap 0x00400000 //4096k
./mmap 0x00800000 //8192k
./mmap 0x01000000 //16384k
./mmap 0x10000000 //256Mb
./mmap 0x100000000 //4096mb
Here is included a sample program. Use gcc -Wall -g mmap.c -o mmap to compile.
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#define _GNU_SOURCE /* MAP_ANONYMOUS */
#include <sys/mman.h>
#include <unistd.h> /* sysconf */
static size_t sys_page_size = 0;
void
page_init(void)
{
int sc;
if(!sys_page_size)
{
sc = sysconf(_SC_PAGESIZE);
if(sc == -1)
{
sys_page_size = 0x0000000000001000; /* Default to 4Kb */
}
else
{
sys_page_size = sc;
}
}
}
void *
page_allocate(size_t request,
size_t alignment)
{
size_t size;
size_t slop;
void *addr;
/* Round up to page size multiple.
*/
request = (request + (sys_page_size - 1)) & ~(sys_page_size -1);
alignment = (alignment + (sys_page_size - 1)) & ~(sys_page_size -1);
size = request + alignment;
/* Maybe we get lucky.
*/
addr = mmap(NULL,
request,
PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS,
-1,
0);
if(!((uintptr_t)addr & (alignment - 1)))
{
return addr;
}
munmap(addr, request);
addr = mmap(NULL,
size,
PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS,
-1,
0);
slop = (uintptr_t)addr & (request - 1);
#define POINTER_OFFSET(addr, offset) ((void *)((uintptr_t)(addr) + (offset)))
if(slop)
{
munmap(addr, request - slop);
munmap(POINTER_OFFSET(addr, size - slop), slop);
addr = POINTER_OFFSET(addr, request - slop);
}
else
{
munmap(POINTER_OFFSET(addr, request), request);
}
return addr;
}
int
main(int argc,
char **argv)
{
size_t size;
void *cmp = NULL;
void *ptr[16];
int i;
page_init();
if(argc == 2)
{
size = strtol(argv[1], NULL, 16);
}
else
{
size = 0x00001000;
}
for(i = 0;
i < 16;
i ++)
{
ptr[i] = page_allocate(size, size);
}
for(i = 0;
i < 16;
i ++)
{
printf("%2i: %p-%p %s\n",
i,
ptr[i],
(void *)((uintptr_t)ptr[i] + size - 1),
(llabs(ptr[i] - cmp) == size) ? "contiguous" : "");
cmp = ptr[i];
}
return 1;
}

You should never expect mmap to provide contiguous addresses on its own, but you could attempt to get them by requesting an address that would make the new mapping contiguous as long as the address range is free, and omitting MAP_FIXED so that the requested address is only used as a hint (with MAP_FIXED, it would replace what's already there, which is definitely not what you want).
I suspect the reason you're seeing mmap return contiguous maps for some calls, but not all, is that the kernel is trying to balance the cost of making new VM areas with the desire to have addresses be unpredictable (ASLR).

Related

Is there a maximum number of continuous pages per process in Linux? If so, how to set it to unlimited?

The following code will generate errno 12 cannot allocate memory
#include <sys/mman.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <memory.h>
#include <errno.h>
int main()
{
char* p;
for (size_t i = 0; i < 0x10000; i++)
{
char* addr = (char*)0xAAA00000000uL - i * 0x2000;
p = mmap(addr, 0x1000,
PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (p != addr) {
printf("%lu %d\n", i, errno);
getchar();
return 1;
}
memset(p, 'A' + (i % 26), 0x1000);
}
return 0;
}
The output is 65510 12 on my machine.
However, if we change size of each page from 0x1000 to 0x2000, the allocation will be successful, even if it is using more memory.
The only difference I think is the number of continuous pages, is there a limitation on this? If so, how to set it to unlimited?
It seems that setting /proc/sys/vm/max_map_count to a larger number solves the problem.
Reference: How much memory could vm use

How do you find where the end of a partition ends on a Linux drive?

In one of my classes our assignment is to find and print some values in the first superblock and then compare those values against all other superblocks in the power of 3, 5 and 7 until the end of the partition.
So for example, you compare superblock0 with superblock1, 0 and 3, 0 and 5, 0 and 7, 0 and 9 and so on.
My code to output the values in a superblock works, and I have an algorithm in mind that will get all powers of 3, 5 and 7 but I'm not sure how to detect the end of the partition.
And how to loop through all superblocks with those powers until the end or what the break case would be.
Below is my code to access the first superblock.
int fd;
super_block_t s;
if((fd = open(DEVICE, O_RDONLY)) < 0){ //check if disk can be read
perror(DEVICE);
exit(1);
}
//read superblock
lseek(fd, OFFSET, SEEK_SET);
read(fd, &s, sizeof(s));
close(fd);
You can either try to seek there and see if you get a EINVAL, or you can use the BLKGETSIZE64 ioctl to fetch the size of the block device:
#include <linux/fs.h>
#include <stdint.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/ioctl.h>
void main(int argc, char** argv) {
int fd = open(argv[1], O_RDONLY);
uint64_t size;
ioctl(fd, BLKGETSIZE64, &size);
printf("Size in bytes: %llu\n", size);
}
There is a difference between the size of the block device a filesystem is on, and the size of that filesystem itself. Given your task is to operate on the superblocks within the filesystem itself, I'll presume that you are more interested in the latter. If you're more interested in the actual device size, then the answer by #that-other-guy is correct.
Assuming that you are working with ext4 for the filesystem, and based on the information here, the full size of the filesystem would be the total block count multiplied by the block size. In the structure of the superblock, the relevant fields are:
s_blocks_count_lo at offset 0x4, and
s_log_block_size at offset 0x18
s_blocks_count_lo is straightforward, but s_log_block_size needs a bit of processing, as the value stored means:
Block size is 2 ^ (10 + s_log_block_size).
Putting all that together, you can do something like:
uintmax_t get_filesystem_size(const char *device) {
int fd;
if((fd = open(device, O_RDONLY)) < 0) {
perror(device);
exit(1);
}
if (lseek(fd, 1024, SEEK_SET) < 0) {
perror("lseek");
exit(1);
}
uint8_t block0[1024];
if (read(fd, &block0, 1024) < 0) {
perror("read");
exit(1);
}
if (s_magic(block0) != 0xef53) {
fprintf(stderr, "bad magic\n");
exit(1);
}
close(fd);
return s_blocks_count_lo(block0) * s_block_size(block0);
}
with the following ext4 superblock specific helper functions:
uint16_t s_magic(const uint8_t *buffer) {
return le16(buffer + 0x38);
}
uint32_t s_blocks_count_lo(const uint8_t *buffer) {
return le32(buffer + 0x4);
}
uintmax_t s_block_size(const uint8_t *buffer) {
return 1 << (10 + le32(buffer + 0x18));
}
and the following general endianness helper functions:
uint16_t le16(const uint8_t *buffer) {
int result = 0;
for (int i = 1; i >= 0; i--) {
result *= 256;
result += buffer[i];
}
return result;
}
uint32_t le32(const uint8_t *buffer) {
int result = 0;
for (int i = 3; i >= 0; i--) {
result *= 256;
result += buffer[i];
}
return result;
}

mmap'ed memory access very slow

I use v4l2 to get video frame from camera using streaming io and need to do some calculation on the frame.
However, accessing the frame memory is 10 times slower than allocating
a malloc'ed memory.
I guess the mmap'ed frame memory is not cacheable by cpu.
Here is the test code.
//mmap video buffers
struct v4l2_buffer buf;
CLEAR(buf);
buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
buf.memory = V4L2_MEMORY_MMAP;
buf.index = i;
if (-1 == xioctl(_fdCamera, VIDIOC_QUERYBUF, &buf)) {
ERROR("VIDIOC_QUERYBUF");
goto error_querybuf;
}
_buffers[i].start = mmap(NULL, buf.length,
PROT_READ | PROT_WRITE,
MAP_SHARED, _fdCamera, buf.m.offset);
//use mmap'ed buffer to do some calculation
u16* frame=_buffers[0].start;
u64 sum=0;
u16* p=frame;
time[0]=GetMicrosecond64();
while(p!=frame+PIXELS){
sum+=*p;
p++;
}
time[1]=GetMicrosecond64();
printf("mmap:sum %lld,time %lld\n",sum, time[1] - time[0]);
//use a copy of data to do some calculation
u16* frame_copy=(u16*)malloc(PIXELS*2);
memcpy(frame_copy,frame,PIXELS*2);
sum=0;
p=frame_copy;
time[0]=GetMicrosecond64();
while(p!=frame_copy+PIXELS){
sum+=*p;
p++;
}
time[1]=GetMicrosecond64();
printf("malloc:sum %lld,time %lld\n",sum, time[1] - time[0]);
update:
I use s5pv210 with linux-2.6.35.The fimc_dev.c indicates the mmaped is uncached.
How to make the frame buffer memory support both DMA and cache?
static inline int fimc_mmap_cap(struct file *filp, struct vm_area_struct *vma)
{
struct fimc_prv_data *prv_data =
(struct fimc_prv_data *)filp->private_data;
struct fimc_control *ctrl = prv_data->ctrl;
u32 size = vma->vm_end - vma->vm_start;
u32 pfn, idx = vma->vm_pgoff;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
vma->vm_flags |= VM_RESERVED;
/*
* page frame number of the address for a source frame
* to be stored at.
*/
pfn = __phys_to_pfn(ctrl->cap->bufs[idx].base[0]);
if ((vma->vm_flags & VM_WRITE) && !(vma->vm_flags & VM_SHARED)) {
fimc_err("%s: writable mapping must be shared\n", __func__);
return -EINVAL;
}
if (remap_pfn_range(vma, vma->vm_start, pfn, size, vma->vm_page_prot)) {
fimc_err("%s: mmap fail\n", __func__);
return -EINVAL;
}
return 0;
}

Bare-metal Loader - Send .elf binary to other processor through shared memory and execute

Setup:
One ARM-CPU (A9) running busybox-Linux. This one talks to the network and gets a precompiled statically linked elf.
Second CPU runs bare-metal application. I have newlib on that one and the whole "OS" sits in memory just executing that one basic program.
Both share OCM.
I have succeeded in making the two processors "talk". I can write hex-values into memory from linux, the other processor reads it and vice-versa.
Now I'd like to parse the aforementioned elf, send it to OCM, make the bare-metal read it into it's memory, set the program counter via asm and execute said elf (could use a .o file as well).
I got stuck at parsing the elf already...
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <byteswap.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <errno.h>
#include <sys/mman.h>
#define PAGE_SIZE ((size_t)getpagesize())
#define PAGE_MASK ((uint64_t)(long)~(PAGE_SIZE - 1))
const unsigned int COMM_RX_DATA = 0xFFFF900c;
int main(int argc, char **argv){
int fd;
int cached = 0;
unsigned char* c;
//uint32_t value;
unsigned char value;
uint64_t offset = COMM_RX_DATA;
uint64_t base;
volatile uint8_t *mm;
fprintf(stderr, "Nr. 0\n");
FILE* f_read;
if ((argc != 1) && (argc != 2)) {
fprintf(stderr, "usage: %s ELF_NAME\n", argv[0]);
return 1;
}
fd = open("/dev/mem", O_RDWR);//|(!cached ? O_SYNC : 0));
if (fd < 0) {
fprintf(stderr, "open(/dev/mem) failed (%d)\n", errno);
return 1;
}
f_read = fopen(argv[1], "rb");
if(!f_read){
fprintf(stderr, "read failed");
return 1;
}
else {
printf("Nr. 1\n");
base = offset & PAGE_MASK;
offset &= ~PAGE_MASK;
mm = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, base);
fseek(f_read, 0, SEEK_END);
int size = ftell(f_read);
fseek(f_read, 0, SEEK_SET); //Reset stream
c = malloc(size);
//malloc error checking!
while (fgets(c, size, f_read) != NULL ){ //tried fgetc but segfaults
//tmp-output to stdout
puts(c);
value = c;//strtoull((char*)c, NULL, 0);
printf("Writing %d to %d", (int)value, (int)(mm + offset));
*(volatile uint32_t *)(mm + offset) = value;
printf("size: %d , value = %s\n", size, value);
}
}
munmap((void *)mm, PAGE_SIZE);
fclose(f_read);
return 0;
}
_asm-idea.S:
ExecuteR0:
mov lr, r0 /* move the destination address into link register */
mcr 15,0,r0,cr7,cr5,0 /* Invalidate Instruction cache */
mcr 15,0,r0,cr7,cr5,6 /* Invalidate branch predictor array */
dsb
isb /* make sure it completes */
ldr r4, =0
mcr 15,0,r4,cr1,cr0,0 /* disable the ICache and MMU */
isb /* make sure it completes */
bx lr /* force the switch, destination should have been in r0 */
Help me mighty SO you're my only hope.

How to use /dev/kmem?

Updated my post...
I got below program. It operates on /dev/kmem and /dev/mem.
I think I can learn something from the code. But when I run it on my Beagle Board, below result is given:
case 1: ( if(1) )
root#omap:/home/ubuntu/tom# ./kmem_mem /boot/System.map-3.0.4-x3
found jiffies at (0xc0870080) c0870080
/dev/kmem read buf = 319317
jiffies=319317 (read from virtual memory)
/dev/mem: the offset is 870080
the page size = 4096
mmap: Invalid argument
case 2: ( if(0) )
root#omap:/home/ubuntu/tom# ./kmem_mem /boot/System.map-3.0.4-x3
found jiffies at (0xc0870080) c0870080
/dev/kmem read buf = 333631
jiffies=333631 (read from virtual memory)
/dev/mem: the offset is 870080
/dev/mem read failed: Bad address
jiffies=0 (read from physical memory)
And I used below command so that mmap can use NULL as its first parameter.
root#omap:/home/ubuntu/tom# echo 0 > /proc/sys/vm/mmap_min_addr
root#omap:/home/ubuntu/tom# cat /proc/sys/vm/mmap_min_addr
0
As you can see, read_kmem() works fine but read_mem() doesn't work, and it seems that the 'offset' transferred to it is wrong. But kernel address - PAGE_OFFSET(0xC0000000) = physical address, is it wrong?
My questions are:
(1) Why "mmap: Invalid argument" in case 1?
(2) Why the mmap only maps PAGE_SIZE length space?
(3) What's wrong with read_mem?
Can anyone help?
Thanks!
/*
* getjiff.c
*
* this toolkit shows how to get jiffies value from user space:
* 1. find jiffies's address from kernel image.
* 2. access virtual address space to get jiffies value.
* 3. access physical address sapce to get jiffies value.
*
* demostrate following techniques:
* o get ELF object symbol address by calling nlist()
* o access virtual memory space from /dev/kmem
* o access virtual memory space from /dev/mem
*/
#include <stdio.h>
#include <stdlib.h> //exit
#include <linux/a.out.h> //nlist
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <memory.h>
#define LONG *(volatile unsigned long*)
/* read from virtual memory */
int read_kmem(off_t offset, void* buf, size_t count)
{
int fd;
int n;
fd = open("/dev/kmem", O_RDONLY);
if (fd < 0)
{
perror("open /dev/kmem failed");
return -1;
}
lseek(fd, offset, SEEK_SET);
n = read(fd, buf, count);
if (n != count)
perror("/dev/kmem read failed");
else
printf("/dev/kmem read buf = %ld\n", *(unsigned long *)buf);
close(fd);
return n;
}
/* read from physical memory */
int read_mem(off_t offset, void* buf, size_t count)
{
int fd;
int n;
int page_size;
void *map_base;
unsigned long value;
printf("/dev/mem: the offset is %lx\n", offset);
fd = open("/dev/mem", O_RDONLY);
if (fd < 0)
{
perror("open /dev/mem failed");
return -1;
}
if(1){
page_size = getpagesize();
printf("the page size = %d\n", page_size);
map_base = mmap(0,page_size,PROT_READ,MAP_SHARED,fd,offset);
if (map_base == MAP_FAILED){
perror("mmap");
exit(1);
}
value = LONG(map_base);
printf("/dev/mem: the value is %ld\n", value);
buf = (unsigned long *)map_base;
}
if(0){
lseek(fd, offset, SEEK_SET);
n = read(fd, buf, count);
if (n != count)
perror("/dev/mem read failed");
else
printf("/dev/mem read buf = %ld\n", *(unsigned long *)buf);
}
close(fd);
return n;
}
int main(int argc, char **argv)
{
FILE *fp;
char addr_str[11]="0x";
char var[51];
unsigned long addr;
unsigned long jiffies;
char ch;
int r;
if (argc != 2) {
fprintf(stderr,"usage: %s System.map\n",argv[0]);
exit(-1);
}
if ((fp = fopen(argv[1],"r")) == NULL) {
perror("fopen");
exit(-1);
}
do {
r = fscanf(fp,"%8s %c %50s\n",&addr_str[2],&ch,var); // format of System.map
if (strcmp(var,"jiffies")==0)
break;
} while(r > 0);
if (r < 0) {
printf("could not find jiffies\n");
exit(-1);
}
addr = strtoul(addr_str,NULL,16); //Convert string to unsigned long integer
printf("found jiffies at (%s) %08lx\n",addr_str,addr);
read_kmem(addr, &jiffies, sizeof(jiffies));
printf("jiffies=%ld (read from virtual memory)\n\n", jiffies);
jiffies = 0; //reinit for checking read_mem() below
read_mem(addr-0xC0000000, &jiffies, sizeof(jiffies));
printf("jiffies=%ld (read from physical memory)\n", jiffies);
return 0;
}
I've tried combinations or offset and bs for dd and found this solution:
On PC, in build directory I've found location of jiffies.
grep -w jiffies System.map
c04660c0 D jiffies
On PandaBoard:
In /proc/iomem you can see:
80000000-9c7fffff : System RAM
80008000-80435263 : Kernel code
80464000-804d0d97 : Kernel data
a0000000-bfefffff : System RAM
RAM starts from physical 80000000, and Kernel data start on 80464000. Looks similar to address of jiffies.
Then convert from virtual address to phys: virt - 0xC000000 + 0x8000000.
dd if=/dev/mem skip=$((0x804660c)) bs=$((0x10)) count=1 2> /dev/null | hexdump
0000000 02b9 0002 0001 0000 0000 0000 0000 0000
0000010
Try several times and see how the value is incrementing.
Summary: /dev/mem uses phys address, RAM starts at phys address 0x8000000
For the invalid argument in case 1, the problem is offset being non-page aligned. mmap(2) works by manipulating page tables, and such works only on multiplies of page-size for both size and offset
As for the second case, I'm not sure if you're guaranteed to have kernel space begin at 3G boundary. Also, I'm pretty sure that's the boundary of kernel's virtual space, not location in physical memory - so on beagle board, quite possibly you ended up with a wrapped-around offset pointing who-knows-where.
I think what you might need is PHYS_OFFSET, not PAGE_OFFSET.

Resources