My application writes dumps huge files to disk. For various reasons it is more convenient to use mmap and memory writes than using the fwrite interface.
The slow part of writing writing to a file in this way are page faults. Using mmap with MAP_POPULATE is supposed to help; from the man page:
MAP_POPULATE (since Linux 2.5.46)
Populate (prefault) page tables for a mapping. For a file mapping, this causes read-ahead on the file. This will help to
reduce blocking on page faults later. MAP_POPULATE is supported for private mappings only since Linux 2.6.23.
(To answer the obvious question: I've tested this on relatively recent kernels, on 4.15 and 5.1).
However, this does not seem to reduce pagefaults while writing to the mapped file.
Minimal example code: test.c:
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <unistd.h>
static int exit_with_error(const char *msg) {
perror(msg);
exit(EXIT_FAILURE);
}
int main() {
const size_t len = 1UL << 30;
const char *fname = "/tmp/foobar-huge.txt";
int f = open(fname, O_RDWR|O_CREAT|O_EXCL, 0644);
if(f == -1) {
exit_with_error("open");
}
int ret = ftruncate(f, len);
if(ret == -1) {
exit_with_error("ftruncate");
}
void *mem = mmap(NULL, len, PROT_WRITE|PROT_READ, MAP_SHARED|MAP_POPULATE, f, 0);
if(mem == MAP_FAILED) {
exit_with_error("mmap");
}
ret = close(f);
if(ret == -1) {
exit_with_error("close");
}
memset(mem, 'f', len);
}
When running this under a profiler or using perf stat it's clearly visible that the memset at the end triggers (many) pagefaults.
In fact, this program is slower when MAP_POPULATE is passed, on my machine ~1.8s vs ~1.6s without MAP_POPULATE. The difference simply seems to be the time it takes to do the populate, the number of page faults that perf stat reports is identical.
A last observation is that this behaves as expected when I read from the file, instead of writing -- in this case the MAP_POPULATE reduces the number of pagefaults to almost zero and helps to improve performance drastically.
Is this the expected behavior for MAP_POPULATE? Am I doing something wrong?
Because although the pages have been prefaulted by MAP_POPULATE, they are not dirty pages. Therefore, when writing to these pages, the CoW page faults would be triggered to set the page entries as dirty and writable.
Related
I have the following simple program, which basically just mmaps a file and sums every byte in it:
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
volatile uint64_t sink;
int main(int argc, char** argv) {
if (argc < 3) {
puts("Usage: mmap_test FILE populate|nopopulate");
return EXIT_FAILURE;
}
const char *filename = argv[1];
int populate = !strcmp(argv[2], "populate");
uint8_t *memblock;
int fd;
struct stat sb;
fd = open(filename, O_RDONLY);
fstat(fd, &sb);
uint64_t size = sb.st_size;
memblock = mmap(NULL, size, PROT_READ, MAP_SHARED | (populate ? MAP_POPULATE : 0), fd, 0);
if (memblock == MAP_FAILED) {
perror("mmap failed");
return EXIT_FAILURE;
}
//printf("Opened %s of size %lu bytes\n", filename, size);
uint64_t i;
uint8_t result = 0;
for (i = 0; i < size; i++) {
result += memblock[i];
}
sink = result;
puts("Press enter to exit...");
getchar();
return EXIT_SUCCESS;
}
I make it like this:
gcc -O2 -std=gnu99 mmap_test.c -o mmap_test
You pass it a file name and either populate or nopopulate1, which controls whether MAP_POPULATE is passed to mmap or not. It waits for you to type enter before exiting (giving you time to check out stuff in /proc/<pid> or whatever).
I use a 1GB test file of random data, but you can really use anything:
dd bs=1MB count=1000 if=/dev/urandom of=/dev/shm/rand1g
When MAP_POPULATE is used, I expect zero major faults and a small number of page faults for a file in the page cache. With perf stat I get the expected result:
perf stat -e major-faults,minor-faults ./mmap_test /dev/shm/rand1g populate
Press enter to exit...
Performance counter stats for './mmap_test /dev/shm/rand1g populate':
0 major-faults
45 minor-faults
1.323418217 seconds time elapsed
The 45 faults just come from the runtime and process overhead (and don't depend on the size of the file mapped).
However, /usr/bin/time reports ~15,300 minor faults:
/usr/bin/time ./mmap_test /dev/shm/rand1g populate
Press enter to exit...
0.05user 0.05system 0:00.54elapsed 20%CPU (0avgtext+0avgdata 977744maxresident)k
0inputs+0outputs (0major+15318minor)pagefaults 0swaps
The same ~15,300 minor faults is reported by top and by examining /proc/<pid>/stat.
Now if you don't use MAP_POPULATE, all the methods, including perf stat agree there are ~15,300 page faults. For what it's worth, this number comes from 1,000,000,000 / 4096 / 16 = ~15,250 - that is, 1GB divided in 4K pages, with an additional factor of 16 reduction coming from a kernel feature ("faultaround") which faults in up to 15 nearby pages that are already present in the page cache when a fault is taken.
Who is right here? Based on the documented behavior of MAP_POPULATE, the figure returned by perf stat is the correct one - the single mmap call has already populated the page tables for the entire mapping, so there should be no more minor faults when touching it.
1Actually, any string other than populate works as nopopulate.
My problem is to deal with sparse file reads and understand where the extents of the file are to perform some logic around it.
Since, there is no direct API call to figure these stuff out, I decided to use ioctl api to do this. I got the idea from how cp command deals with problems of copying over sparse files by going through their code and ended up seeing this.
https://github.com/coreutils/coreutils/blob/df88fce71651afb2c3456967a142db0ae4bf9906/src/extent-scan.c#L112
So, I tried to do the same thing in my sample program running in user space and it errors out with "Invalid argument". I am not sure what I am missing or if this is even possible from userspace. I am running on ubuntu 14.04 on an ext4 file system. Could this be a problem with device driver supporting these request modes underneath?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/fcntl.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <linux/fs.h>
#include "fiemap.h" //This is from https://github.com/coreutils/coreutils/blob/df88fce71651afb2c3456967a142db0ae4bf9906/src/fiemap.h
int main(int argc, char* argv[]) {
int input_fd;
if(argc != 2){
printf ("Usage: ioctl file1");
return 1;
}
/* Create input file descriptor */
input_fd = open (argv [1], O_RDWR);
if (input_fd < 0) {
perror ("open");
return 2;
}
union { struct fiemap f; char c[4096]; } fiemap_buf;
struct fiemap *fiemap = &fiemap_buf.f;
int s = ioctl(input_fd, FS_IOC_FIEMAP, fiemap);
if (s == 0) {
printf("ioctl success\n");
} else {
printf("ioctl failure\n");
char * errmsg = strerror(errno);
printf("error: %d %s\n", errno, errmsg);
}
/* Close file descriptors */
close (input_fd);
return s;
}
As you're not properly setting the fiemap_buf.f parameters before invoking ioctl(), it is likely that the EINVAL is coming from the fiemap invalid contents than from the FS_IOC_FIEMAP request identifier support itself.
For instance, the ioctl_fiemap() (from kernel) will evaluate the fiemap.fm_extent_count in order to determine if it is greater than FIEMAP_MAX_EXTENTS and return -EINVAL in that case. Since no memory reset nor parameterization is being performed on fiemap, this is very likely the root cause of the problem.
Note that from the coreutils code you referenced, it performs the correct parameterization of fiemap before calling ioctl():
fiemap->fm_start = scan->scan_start;
fiemap->fm_flags = scan->fm_flags;
fiemap->fm_extent_count = count;
fiemap->fm_length = FIEMAP_MAX_OFFSET - scan->scan_start;
Note fiemap is not recommended as you have to be sure to pass FIEMAP_FLAG_SYNC which has side effects. The lseek(), SEEK_DATA and SEEK_HOLE interface is the recommended one, though note that will, depending on file system, represent unwritten extents (allocated zeros) as holes.
I'm working on a project that is trying to search for specific bytes (e.g. 0xAB) in a filesystem (e.g. ext2). I was able to find what I needed using malloc(), realloc(), and memchr(), but it seemed slow so I was looking into using mmap(). What I am trying to do is find a specific bytes, then copy them into a struct, so I have two questions: (1) is using mmap() the best strategy, and (2) why isn't the following code working (I get EINVAL error)?
UPDATE: The following program compiles and runs but I still have a couple issues:
1) it won't display correct file size on large files (displayed correct size for 1GB flash drive, but not for 32GB)*.
2) it's not searching the mapping correctly**.
*Is THIS a possible solution to getting the correct size using stat64()? If so, is it something I add in my Makefile? I haven't worked with makefiles much so I don't know how to add something like that.
**Is this even the proper way to search?
#define _LARGEFILE64_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <errno.h>
#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)
int main(int argc, char **argv) {
int fd = open("/dev/sdb1", O_RDONLY);
if(fd < 0) {
printf("Error %s\n", strerror(errno));
return -1;
}
const char * map;
off64_t size;
size = lseek64(fd, 0, SEEK_END);
printf("file size: %llu\n", size);
lseek64(fd, 0, SEEK_SET);
map = mmap(0, size, PROT_READ, MAP_SHARED, fd, 0);
if (map == MAP_FAILED) { handle_error("mmap error"); }
printf("Searching for magic numbers...\n");
for (i=0; i < size; i++) {
if(map[i] == 0X53 && map[i + 1] == 0XEF) {
if ((map[i-32] == 0X00 && map[i-31] == 0X00) ||
(map[i-32] == 0X01 && map[i-31] == 0X00) ||
(map[i-32] == 0X02 && map[i-31] == 0X00)) {
if(j <= 5) {
printf("superblock %d found\n", j);
++j;
} else break;
int q;
for(q=0; q<j; q++) {
printf("SUPERBLOCK[%d]: %d\n", q+1, sb_pos[q]);
}
fclose(fd);
munmap(map, size);
return 0;
}
Thanks for your help.
mmap is a very efficient way to handle searching a large file, especially in cases where there's an internal structure you can use (e.g. using mmap on a large file with fixed-size records that are sorted would permit you to do a binary search, and only the pages corresponding to records read would be touched).
In your case you need to compile for 64 bits and enable large file support (and use open(2)).
If your /dev/sdb1 is a device and not a file, I don't think stat(2) will show an actual size. stat returns a size of 0 for these devices on my boxes. I think you'll need to get the size another way.
Regarding address space: x86-64 uses 2^48 bytes of virtual address space, which is 256 TiB. You can't use all of that, but there's easily ~127 TiB of contiguous address space in most processes.
I just noticed that I was using fopen(), should I be using open() instead?
Yes, you should use open() instead of fopen(). And that's the reason why you got EINVAL error.
fopen("/dev/sdb1", O_RDONLY);
This code is totally incorrect. O_RDONLY is flag that ought to be used with open() syscall but not with fopen() libc functgion
You should also note that mmaping of large files is available only if you are running on a platform with large virtual address space. It's obvious: you should have enough virtual memory to address your file. Speaking about Intel, it should be only x86_64, not x86_32.
I haven't tried to do this with really large files ( >4G). May be some additional flags are required to be passed into open() syscall.
I'm working on a project that is trying to search for specific bytes (e.g. 0xAB) in a filesystem (e.g. ext2)
To mmap() a large file into memory is totally wrong approach in your case. You just need to process your file step-by-step by chunks with fixed size (something about 1MB). You can use mmap() or just read() it into your intenal buffer - that doesn't matter. But putting a whole file into memory is totally overkill if you just want to process it sequentially.
I want to shred some temp files produced by my C program before the files are removed.
Currently I am using
system("shred /tmp/datafile");
system("rm /tmp/datafile");
from within my program, but I think instead of calling the system function is not the best way (correct me if I am wrong..) Is there any other way I can do it? How do I shred the file from within my code itself? A library, or anything? Also, about deletion part, is this answer good?
Can I ask why you think this is not the best way to achieve this? It looks like a good solution to me, if it is genuinely necessary to destroy the file contents irretrievably.
The advantage of this way of doing it are:
the program already exists (so it's faster to develop); and
the program is already trusted.
The second is an important point. It's possible to overstate the necessity of elaborately scrubbing files (Peter Gutmann, in a remark quoted on the relevant wikipedia page, has described some uses of his method as ‘voodoo’), but that doesn't matter: in any security context, using a pre-existing tool is almost always more defensible than using something home-made.
About the only criticism I'd make of your current approach, using system(3), is that since it looks up the shred program in the PATH, it would be possible in principle for someone to play games with that and get up to mischief. But that's easily dealt with: use fork(2) and execve(2) to invoke a specific binary using its full path.
That said, if this is just a low-impact bit of tidying up, then it might be still more straightforward to simply mmap the file and quickly write zeros into it.
You can use the following code:
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>
#define BUF_SIZE 4096
#define ABS_FILE_PATH "/tmp/aaa"
int main()
{
//get file size
struct stat stat_buf;
if (stat(ABS_FILE_PATH, &stat_buf) == -1)
return errno;
off_t fsize = stat_buf.st_size;
//get file for writing
int fd = open(ABS_FILE_PATH, O_WRONLY);
if (fd == -1)
return errno;
//fill file with 0s
void *buf = malloc(BUF_SIZE);
memset(buf, 0, BUF_SIZE);
ssize_t ret = 0;
off_t shift = 0;
while((ret = write(fd, buf,
((fsize - shift >BUF_SIZE)?
BUF_SIZE:(fsize - shift)))) > 0)
shift += ret;
close(fd);
free(buf);
if (ret == -1)
return errno;
//remove file
if (remove(ABS_FILE_PATH) == -1)
return errno;
return 0;
}
I am working on an application wherein i need to compare 10^8 entries(alphanumeric entries). To retrieve the entries from file( file size is 1.5 GB) and then to compare them, i need to take less than 5 minutes of time. So, what would b the effective way to do that, since, only retrieving time is exceeding 5 min. And i need to work on file only. please suggest a way out.
I m working on windows with 3GB RAM n 100Gb hard disk.
Read a part of the file, sort it, write it to a temporary file.
Merge-sort the resulting files.
Error handling and header includes are not included. You need to provide DataType and cmpfunc, samples are provided. You should be able to deduce the core workings from this snippet:
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
typedef char DataType; // is this alphanumeric?
int cmpfunc(char const *left, char const *right)
{
return *right - *left;
}
int main(int argc, char **argv)
{
int fd = open(argv[1], O_RDWR|O_LARGEFILE);
if (fd == -1)
return 1;
struct stat st;
if (fstat(fd, &st) != 0)
return 1;
DataType *data = mmap(NULL, st.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (!data)
return 1;
qsort(data, st.st_size / sizeof(*data), cmpfunc);
if (0 != msync(data, st.st_size, MS_SYNC))
return 1;
if (-1 == munmap(data, st.st_size))
return 1;
if (0 != close(fd))
return 1;
return 0;
}
I can't imagine you can get much faster than this. Be sure you have enough virtual memory address space (1.5GB is pushing it but will probably just work on 32bit Linux, you'll be able to manage this on any 64bit OS). Note that this code is "limited" to working on a POSIX compliant system.
In terms of C and efficiency, this approach puts the entire operation in the hands of the OS, and the excellent qsort algorithm.
If retrieving time is exceeding 5 min it seems that you need to look at how you are reading this file. One thing that has caused bad performance for me is that a C implementation sometimes uses thread-safe I/O operations by default, and you can gain some speed by using thread-unsafe I/O.
What kind of computer will this be run on? Many computers nowadays have several gigabytes of memory, so perhaps it will work to just read it all into memory and then sort it there (with, for example, qsort)?