Is there really no mremap in Darwin?

Is there really no mremap in Darwin? - c

I'm trying to find out how to remap memory-mapped files on a Mac (when I want to expand the available space).
I see our friends in the Linux world have mremap but I can find no such function in the headers on my Mac. /Developer/SDKs/MacOSX10.6.sdk/usr/include/sys/mman.h has the following:
mmap
mprotect
msync
munlock
munmap
but no mremap
man mremap confirms my fears.
I'm currently having to munmap and mmmap if I want to resize the size of the mapped file, which involves invalidating all the loaded pages. There must be a better way. Surely?
I'm trying to write code that will work on Mac OS X and Linux. I could settle for a macro to use the best function in each case if I had to but I'd rather do it properly.

If you need to shrink the map, just munmap the part at the end you want to remove.
If you need to enlarge the map, you can mmap the proper offset with MAP_FIXED to the addresses just above the old map, but you need to be careful that you don't map over something else that's already there...
The above text under strikeout is an awful idea; MAP_FIXED is fundamentally wrong unless you already know what's at the target address and want to atomically replace it. If you are trying to opportunistically map something new if the address range is free, you need to use mmap with a requested address but without MAP_FIXED and see if it succeeds and gives you the requested address; if it succeeds but with a different address you'll want to unmap the new mapping you just created and assume that allocation at the requested address was not possible.

If you expand in large enough chunks (say, 64 MB, but it depends on how fast it grows) then the cost of invalidating the old map is negligible. As always, benchmark before assuming a problem.

You can ftruncate the file to a large size (creating a hole) and mmap all of it. If the file is persistent I recommend filling the hole with write calls rather than by writing in the mapping, as otherwise the file's blocks may get unnecessarily fragmented on the disk.

I have no experience with memory mapping, but it looks like you can temporarily map the same file twice as a means to expand the mapping without losing anything.
int main() {
int fd;
char *fp, *fp2, *pen;
/* create 1K file */
fd = open( "mmap_data.txt", O_RDWR | O_CREAT, 0777 );
lseek( fd, 1000, SEEK_SET );
write( fd, "a", 1 );
/* map and populate it */
fp = mmap( NULL, 1000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0 );
pen = memset( fp, 'x', 1000 );
/* expand to 8K and establish overlapping mapping */
lseek( fd, 8000, SEEK_SET );
write( fd, "b", 1 );
fp2 = mmap( NULL, 7000, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0 );
/* demonstrate that mappings alias */
*fp = 'z';
printf( "%c ", *fp2 );
/* eliminate first mapping */
munmap( fp, 1000 );
/* populate second mapping */
pen = memset( fp2+10, 'y', 7000 );
/* wrap up */
munmap( fp2, 7000 );
close( fd );
printf( "%d\n", errno );
}
The output is zxxxxxxxxxyyyyyy.....
I suppose, if you pound on this, it may be possible to run out of address space faster than with mremap. But nothing is guaranteed either way and it might on the other hand be just as safe.

Related

mmap file with larger fixed length with zero padding?

I want to use mmap() to read a file with fixed length (eg. 64MB), but there also some files < 64MB.
I mmap this files (<64MB, eg.30MB) with length = 64MB, when read file data beyond file size(30MB - 64MB), the program got a bus-error.
I want mmap these files with fixed length, and read 0x00 when pointer beyond file size. How to do that?
One method I can think is ftruncate file first, and ftruncate back to ori size, but I don't think this method is perfect.

This is one of the few reasonable use cases for MAP_FIXED, to remap part of an existing mapping to use a new backing file.
A simple solution here is to unconditionally mmap 64 MB of anonymous memory (or explicitly mmap /dev/zero), without MAP_FIXED and store the resulting pointer.
Next, mmap 64 MB or your actual file size (whichever is less) of your actual file, passing in the result of the anonymous/zero mmap and passing the MAP_FIXED flag. The pages corresponding to your file will no longer be anonymous/zero mapped, and instead will be backed by your file's data; the remaining pages will be backed by the anonymous/zero pages.
When you're done, a single munmap call will unmap all 64 MB at once (you don't need to separately unmap the real file pages and the zero backed pages).
Extremely simple example (no error checking, please add it yourself):
// Reserve 64 MB of contiguous addresses; anonymous mappings are always zero backed
void *mapping = mmap(NULL, 64 * 1024 * 1024, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
// Open file and check size
struct stat sb;
int fd = open(myfilename, O_RDONLY);
fstat(fd, &sb);
// Use smaller of file size or 64 MB
size_t filemapsize = sb.st_size > 64 * 1024 * 1024 ? 64 * 1024 * 1024 : sb.st_size;
// Remap up to 64 MB of pages, replacing some or all of original anonymous pages
mapping = mmap(mapping, filemapsize, PROT_READ, MAP_SHARED | MAP_FIXED, fd, 0);
close(fd);
// ... do stuff with mapping ...
munmap(mapping, 64 * 1024 * 1024);

Mmap a large 1TB file and write 1's to it?

I am very new to mmap and memset. I have been assigned a task to create a large file (1TB) and write 1's to it as we are trying to understand the performance.
Now from what I understand, I can basically fallocate a file with 1Tb, then in a C function, I can mmap it with PROT_READ, PROT_WRITE, MAP_SHARED and then memset that mmap'ed pointed with memset, like this :
int fd = open(FILEPATH, O_RDWE, (mode_t)0700);
size_t data_length = 1000000000000;
char *data = (char*)mmap(NULL, data_length, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, fd , 0);
memset(data, '1', data_length);
Is this correct?
Do I need to sync or anything to make this data persistent?
If so, I'm basically writing a 1TB file then why does my function run within split seconds.
I've tried to cat the output file and there indeed are 1's, its just that my terminal craps out after a few seconds, but the data is actually getting written.
Am I doing this correctly or should I actually go and write data to memory rather than memset. If so, how should I do it?
Thanks for any help.

mmap() fails while devmem2 succeeds (C/CPP) [Allwinner A20]

I am trying to access the hardware registers of an A20 SOM by mapping them to userspace. In this case, target is the PIO, listed at physical address 0x01C20800.
The official Olimex Debian7 (wheezy) image is being used. Kernel Linux a20-olimex 3.4.90+
I was able to verify the location by using the devmem2 tool and Allwinner's documentation on the said memory space (switched the pinmode and level with devmem).
The mmap call on the other hand
*map = mmap(
NULL,
BLOCK_SIZE, // = (4 * 1024)
PROT_READ | PROT_WRITE,
MAP_SHARED,
*mem_fd,
*addr_p
);
fails with mmap error: Invalid argument
Here's a more complete version of the code: http://pastebin.com/mfEuVdbJ
Don't worry about the pointers as the same code does work when accessing UART0 at 0x01C28000.
Although only UART0 (and UART4), which is used as serial console.
I've decompiled the script.bin (still in use despite DTB) without success, as UART 0, 7 and 8 are enabled there.
I am also logged in as user root
I would still guess something related to permissions but I'm pretty lost right now since devmem has no problem at all
> root#a20-olimex:~# devmem2 0x01c20800 w /dev/mem opened. Memory mapped
> at address 0xb6f85000.

While sourcejedi didn't certainly fix my issue, he gave me the right approach.
I took a look at the forementioned devmem tool's source to discover that the mmap call's address is masked
address & ~MAP_MASK to get the entire page, which is essentially the same operation as in my comment.
However, to get back to the right place after the mapping has been done, you have to add the mask back
final_address = mapped_address + (target_address & MAP_MASK);
This resulted in following code (based on OP's pastebin)
Where
MAP_MASK = (sysconf(_SC_PAGE_SIZE) - 1) in this case 4095
int map_peripheral(unsigned long *addr_p, int *mem_fd, void **map, volatile unsigned int **addr)
{
if (!(*addr_p)) {
printf("Called map_peripheral with uninitilized struct.\n");
return -1;
}
// Open /dev/mem
if ((*mem_fd = open("/dev/mem", O_RDWR | O_SYNC)) < 0) {
printf("Failed to open /dev/mem, try checking permissions.\n");
return -1;
}
*map = mmap(
NULL,
MAP_SIZE,
PROT_READ | PROT_WRITE,
MAP_SHARED,
*mem_fd, // file descriptor to physical memory virtual file '/dev/mem'
*addr_p & ~MAP_MASK // address in physical map to be exposed
/************* magic is here **************************************/
);
if (*map == MAP_FAILED) {
perror("mmap error");
return -1;
}
*addr = (volatile unsigned int *)(*map + (*addr_p & MAP_MASK));
/************* and here ******************************************/
return 0;
}

if you read the friendly manual
EINVAL Invalid argument (POSIX.1)
is the error code. (Not EPERM!). So we look it up for the specific function
EINVAL We don't like addr, length, or offset (e.g., they are too large,
or not aligned on a page boundary).
BLOCK_SIZE, // 1024 - ?
You want a multiple of sysconf(_SC_PAGE_SIZE). In practice it will be 4096. I won't bother with the fully general math to calculate it - you'll find examples if you need it.

Copying files using memory map

I want to implement an effective file copying technique in C for my process which runs on BSD OS. As of now the functionality is implemented using read-write technique. I am trying to make it optimized by using memory map file copying technique.
Basically I will fork a process which mmaps both src and dst file and do memcpy() of the specified bytes from src to dst. The process exits after the memcpy() returns. Is msync() required here, because when I actually called msync with MS_SYNC flag, the function took lot of time to return. Same behavior is seen with MS_ASYNC flag as well?
i) So to summarize is it safe to avoid msync()?
ii) Is there any other better way of copying files in BSD. Because bsd seems to be does not support sendfile() or splice()? Any other equivalents?
iii) Is there any simple method for implementing our own zero-copy like technique for this requirement?
My code
/* mmcopy.c
Copy the contents of one file to another file, using memory mappings.
Usage mmcopy source-file dest-file
*/
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include "tlpi_hdr.h"
int
main(int argc, char *argv[])
{
char *src, *dst;
int fdSrc, fdDst;
struct stat sb;
if (argc != 3)
usageErr("%s source-file dest-file\n", argv[0]);
fdSrc = open(argv[1], O_RDONLY);
if (fdSrc == -1)
errExit("open");
/* Use fstat() to obtain size of file: we use this to specify the
size of the two mappings */
if (fstat(fdSrc, &sb) == -1)
errExit("fstat");
/* Handle zero-length file specially, since specifying a size of
zero to mmap() will fail with the error EINVAL */
if (sb.st_size == 0)
exit(EXIT_SUCCESS);
src = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fdSrc, 0);
if (src == MAP_FAILED)
errExit("mmap");
fdDst = open(argv[2], O_RDWR | O_CREAT | O_TRUNC, S_IRUSR | S_IWUSR);
if (fdDst == -1)
errExit("open");
if (ftruncate(fdDst, sb.st_size) == -1)
errExit("ftruncate");
dst = mmap(NULL, sb.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fdDst, 0);
if (dst == MAP_FAILED)
errExit("mmap");
memcpy(dst, src, sb.st_size); /* Copy bytes between mappings */
if (msync(dst, sb.st_size, MS_SYNC) == -1)
errExit("msync");
enter code here
exit(EXIT_SUCCESS);
}

Short answer: msync() is not required.
When you do not specify msync(), the operating system flushes the memory-mapped pages in the background after the process has been terminated. This is reliable on any POSIX-compliant operating system.
To answer the secondary questions:
Typically the method of copying a file on any POSIX-compliant operating system (such as BSD) is to use open() / read() / write() and a buffer of some size (16kb, 32kb, or 64kb, for example). Read data into buffer from src, write data from buffer into dest. Repeat until read(src_fd) returns 0 bytes (EOF).
However, depending on your goals, using mmap() to copy a file in this fashion is probably a perfectly viable solution, so long as the files being coped are relatively small (relative to the expected memory constraints of your target hardware and your application). The mmap copy operation will require roughly 2x the total physical memory of the file. So if you're trying to copy a file that's a 8MB, your application will use 16MB to perform the copy. If you expect to be working with even larger files then that duplication could become very costly.
So does using mmap() have other advantages? Actually, no.
The OS will often be much slower about flushing mmap pages than writing data directly to a file using write(). This is because the OS will intentionally prioritize other things ahead of page flushes so to keep the system 'responsive' for foreground tasks/apps.
During the time the mmap pages are being flushed to disk (in the background), the chance of sudden loss of power to the system will cause loss of data. Of course this can happen when using write() as well but if write() finishes faster then there's less chance for unexpected interruption.
the long delay you observe when calling msync() is roughly the time it takes the OS to flush your copied file to disk. When you don't call msync() it happens in the background instead (and also takes even longer for that reason).

Predict malloc block sizes grid in C

I'm trying to optimize my dynamic memory usage. The thing is that I initially allocate some amount of memory for the data I get from a socket. Then, on the new data arrival I'm reallocating memory so the newly arrived part will fit into the local buffer. After some poking around I've found that malloc actually allocates a greater block than requested. In some cases significantly greater; here comes some debug info from malloc_usable_size(ptr):
requested 284 bytes, allocated 320 bytes
requested 644 bytes, reallocated 1024 bytes
It's well known that malloc/realloc are expensive operations. In most cases newly arrived data will fit into a previously allocated block (at least when I requested 644 byes and get 1024 instead), but I have no idea how I can figure that out.
The trouble is that malloc_usable_size should not be relied upon (as described in manual) and if the program requested 644 bytes and malloc allocated 1024, the excess 644 bytes may be overwritten and can not be used safely. So, using malloc for a given amount of data and then use malloc_usable_size to figure out how many bytes were really allocated isn't the way to go.
What I want is to know the block grid before calling malloc, so I will request exactly the maximum amount of bytes greater then I need, store allocated size and on the realloc check if I really need to realloc, or if the previously allocated block is fine just because it's greater.
In other words, if I were to request 644 bytes, and malloc actually gave me 1024, I want to have predicted that and requested 1024 instead.

Depending on your particular implementation of libc you will have different behaviour. I have found in most cases two approaches to do the trick:
Use the stack, this is not always feasible, but C allows VLAs on the stack and is the most effective if you don't intend to pass your buffer to an external thread
while (1) {
char buffer[known_buffer_size];
read(fd, buffer, known_buffer_size);
// use buffer
// released at the end of scope
}
In Linux you can make excellent use of mremap which can enlarge/shrink memory with zero-copy guaranteed. It may move your VM mapping though. Only problem here is that it only works in chunks of system page size sysconf(_SC_PAGESIZE) which is usually 0x1000.
void * buffer = mmap(NULL, init_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
while(1) {
// if needs remapping
{
// zero copy, but involves a system call
buffer = mremap(buffer, new_size, MREMAP_MAYMOVE);
}
// use buffer
}
munmap(buffer, current_size);
OS X has similar semantics to Linux's mremap through the Mach vm_remap, it's a little more compilcated though.
void * buffer = mmap(NULL, init_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
mach_port_t this_task = mach_task_self();
while(1) {
// if needs remapping
{
// zero copy, but involves a system call
void * new_address = mmap(NULL, new_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
vm_prot_t cur_prot, max_prot;
munmap(new_address, current_size); // vm needs to be empty for remap
// there is a race condition between these two calls
vm_remap(this_task,
&new_address, // new address
current_size, // has to be page-aligned
0, // auto alignment
0, // remap fixed
this_task, // same task
buffer, // source address
0, // MAP READ-WRITE, NOT COPY
&cur_prot, // unused protection struct
&max_prot, // unused protection struct
VM_INHERIT_DEFAULT);
munmap(buffer, current_size); // remove old mapping
buffer = new_address;
}
// use buffer
}

The short answer is that the standard malloc interface does not provide the information you are looking for. To use the information breaks the abstraction provided.
Some alternatives are:
Rethink your usage model. Perhaps pre-allocate a pool of buffers at start, filling them as you go. Unfortunately this could complicate your program more than you would like.
Use a different memory allocation library that does provide the needed interface. Different libraries provide different tradeoffs in terms of fragmentation, max run time, average run time, etc.
Use your OS memory allocation API. These are often written to be efficient, but will generally require a system call (unlike a user-space library).

In my professional code, I often take advantage of the actual size allocated by malloc()[etc], rather than the requested size. This is my function for determining the actual allocation size0:
int MM_MEM_Stat(
void *I__ptr_A,
size_t *_O_allocationSize
)
{
int rCode = GAPI_SUCCESS;
size_t size = 0;
/*-----------------------------------------------------------------
** Validate caller arg(s).
*/
#ifdef __linux__ // Not required for __APPLE__, as alloc_size() will
// return 0 for non-malloc'ed refs.
if(NULL == I__ptr_A)
{
rCode=EINVAL;
goto CLEANUP;
}
#endif
/*-----------------------------------------------------------------
** Calculate the size.
*/
#if defined(__APPLE__)
size=malloc_size(I__ptr_A);
#elif defined(__linux__)
size=malloc_usable_size(I__ptr_A);
#else
!##$%
#endif
if(0 == size)
{
rCode=EFAULT;
goto CLEANUP;
}
/*-----------------------------------------------------------------
** Return requested values to caller.
*/
if(_O_allocationSize)
*_O_allocationSize = size;
CLEANUP:
return(rCode);
}

I did some sore research and found two interesting things about malloc realization in Linux and FreeBSD:
1) in Linux malloc increment blocks linearly in 16 byte steps, at least up to 8K, so no optimization needed at all, it's just not reasonable;
2) in FreeBSD situation is different, steps are bigger and tend to grow up with requested block size.
So, any kind of optimization is needed only for FreeBSD as Linux allocates blocks with a very tiny steps and it's very unlikely to receive less then 16 bytes of data from socket.