I'm new in the area of memory mapped and I was wondering if there is any way I can read a text file using memory mapped to a string. I don't really know how to start to write the code.
The general idea with memory mapped I/O is that you tell the operating system (OS) what file you want, and it (after doing some amount of set-up work) tells you where that file now is in memory.
Once that contract is performed, you should be able to copy things to and from that memory in any way you wish (such as with memcpy), and it will magically handle the I/O for you.
Detail depends on which OS you're using since the ISO C standard doesn't specify this behaviour - it's therefore OS-specific.
For example, Windows uses a file mapping paradigm shown here, while Linux uses mmap to allow you to subject a file you've already opened to memory mapping (via its file descriptor).
By way of example, this Linux program, a little voluminous due mainly to its error checking and progress reports, memory maps the file.txt file and outputs its content:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
// Helper function to minimise error code in main.
static int clean(
int retVal, // value to return.
char *err, // error/NULL, allows optional %s for strerror(errno).
int fd, // fd/-1 to close.
void *filMem, // memory/NULL to unmap.
off_t sz, // size if unmapping.
void *locMem // memory/NULL to free.
) {
if (err) printf (err, strerror(errno));
if (locMem) free(locMem);
if (filMem) munmap(filMem, sz);
if (fd >= 0) close(fd);
return retVal;
}
int main(void) {
int fd = open("file.txt", O_RDONLY);
if (fd < 0) return clean(-1, "Can't open: %s\n", -1, NULL, 0, NULL);
printf("File opened okay, fd = %d.\n", fd);
off_t sz = lseek(fd, 0, SEEK_END);
if (sz == (off_t) -1) return clean(-1, "Can't seek: %s\n", fd, NULL, 0, NULL);
printf("File size is %ld.\n", sz);
void *fileArea = mmap(NULL, sz, PROT_READ, MAP_SHARED, fd, 0);
if (! fileArea) return clean(-1, "Can't map: %s\n", fd, NULL, 0, NULL);
printf("File mapped to address %p.\n", fileArea);
char *localArea = calloc(1, sz + 1);
if (! localArea) return clean(-1, "Can't allocate\n", fd, fileArea, sz, NULL);
memcpy(localArea, fileArea, sz);
printf("Data copied to %p, result is [\n%s]\n", localArea, localArea);
return clean(0, NULL, fd, fileArea, sz, localArea);
}
Running that on my local system, the results can be seen from the following transcript:
pax$ cat file.txt
...This is the input file content.
pax$ ./testprog
File opened okay, fd = 3.
File size is 35.
File mapped to address 0x7f868a93b000.
Data copied to 0x1756420, result is [
...This is the input file content.
]
Related
This question already has answers here:
How to change characters in a text file using C's mmap()?
(2 answers)
Closed 2 years ago.
I am trying to read and write a struct using mmap, however the changes I do after the mmap are not being persisted to the disk or are not being loaded correctly.
Here's my code sample, when run the first time the file is created and the prints show the correct data, on the second time when the file already exists the struct is empty, filled with 0's. So it looks like the changes were not written to the file but I am having trouble figuring out why.
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <unistd.h>
typedef struct {
int age;
char name[128];
} person;
int main(int argc, char *argv[argc]){
char filename [] = "data.person";
int fd = open(filename, O_RDWR | O_CREAT , S_IRWXU);
if (fd == -1) {
printf("Failed to create version vector file, error is '%s'", strerror(errno));
exit(1);
}
struct stat st;
fstat(fd, &st);
person *p;
if (st.st_size == 0) {
ftruncate(fd, sizeof(person));
p = (person *) mmap(0, sizeof(person), PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
strcpy(p[0].name, "Hello");
p[0].age = 10;
msync(p, sizeof(person), MS_SYNC);
}else{
p = (person *) mmap(0, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if( p == MAP_FAILED){
printf("mmap failed with error '%s'\n", strerror(errno));
exit(0);
}
}
printf("%s %d\n", p->name, p->age);
munmap(p, sizeof(person));
close(fd);
}
My OS is manjaro 20 and the gcc version is 10.1
Do not use MAP_PRIVATE because:
Create a private copy-on-write mapping. Updates to the
mapping are not visible to other processes mapping the same
file, and are not carried through to the underlying file. It
is unspecified whether changes made to the file after the
mmap() call are visible in the mapped region.
I am running the following C code, where trying to read in buffer which
is allocated on caller's stack, but fails with errno 14 (Bad Address).
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
void wrapper(int fd, char **buf)
{
int res = read(fd, *buf, 10);
printf("res: %d, errno: %d\n", res, errno);
printf("Buf: %s\n", *buf);
}
int main()
{
char buffer[10];
memset(buffer, 0, 10);
int fd = open("main.c", O_RDONLY);
wrapper(fd, (char **)&buffer);
return 0;
}
The output is
res: -1, errno: 14
Buf: (null)
I have been searching for explanation why it fails, whereas changing it to
void wrapper(int fd, char *buf)
...
wrapper(fd, (char *)buffer);
works, but without result so far.
why it fails
Arrays are not pointers. buffer is not a char*. Consequently, &buffer is not a char**, is not compatible with char**, and should not be cast to char**. If it is cast to char** and then dereferenced, the behaviour is undefined.
After analyzed your intention, of course it is possible to create something like a "wrapper" containing read string by read(2) syscall and use that buffer away from wrapper() function. You wanted to pass amount of characters which would be read from file being in a table of files whom index of the table (file descriptor) was return by open(2) syscall. But as n.m. said, arrays are not pointers and your solution cannot work properly.
Let me explain my simple fix to your code:
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
#define AMOUNT 20
#define assert_msg(x) for ( ; !(x) ; assert(x) )
void
wrapper(int fd, char **buf, size_t size)
{
ssize_t res;
char *out;
out = calloc(size + 1, sizeof(char));
assert(out != NULL);
res = read(fd, out, size);
assert_msg(res != -1) {
fprintf(stderr, "Error ocurred: %s\n", strerror(errno));
}
out[size] = '\0';
fprintf(stdout, "Inside function: %s\n", out);
fprintf(stdout, "res: %d, size: %d, errno: (%d: %s)\n", res, size,
errno, strerror(errno));
*buf = out;
}
int
main(int argc, char **argv)
{
int fd;
char *buf;
buf = NULL;
assert(argc == 2);
errno = 0;
fd = open(argv[1], O_RDONLY);
assert_msg(fd != -1) {
fprintf(stderr, "Error ocurred: %s\n", strerror(errno));
}
wrapper(fd, &buf, AMOUNT);
fprintf(stdout, "Outside function: %s\n", buf);
free(buf);
return (EXIT_SUCCESS);
}
I pass a filename as an input argument. It was a bit easier for me instead of hardcoding the name.
As you can see, inside my wrapper() implementation I allocate memory for an out buffer which size I am passing by a value of size variable. I know that the same value as AMOUNT value defined as macro but it would be easy to change in any other solution.
Then, I read given amount of characters using read(2) syscall, from a file descriptor returned by open(2) syscall in main() function which I pass to wrapper().
At the end of that function I tell that I would like to save an address to the beginning of allocated out buffer and I would like that *buf indicates on that address. It is a buffer of size + 1 char elements, allocated on heap, not on a local stack. Therefore program cannot "reuse" that addresses during his execution. Every address for variables declared like int a;, struct type name; or char tab[10]; are "freed" automatically after the end of function and you do not have an access to it. To be clear, you may have an access (e.g. print data from address saved to indicator) but you cannot be sure that you would not lose the data being saved there. Space allocated manually still exist on a heap until calling free(3) function.
So if we would do something like:
void
wrapper(int fd, char **buf, const size_t size)
{
ssize_t res;
char out[size];
(...)
*buf = out;
}
you may lost your data being saved on a local stack during continuing program execution.
Additionally, in my solution I also defined my own macro assert_msg(x) which is able to run assert(3) function and shows a text message with error. But it is only a feature but thanks to that we are able to see string corresponding to an errno number.
Of course, my program need better handling errors but it had to present the idea only.
Furthermore, you should also specify file permissions during using open(2) syscall as a third argument. It looks similar to the second argument because it is a bitwise 'or' separated list of values. Example flags: S_IRUSR, S_IRGRP, S_IWOTH etc.
In that argument, you can also just write proper value describing permissions, for example 0755.
I have seen device file can be accessed directly in Linux and I want to have a try. I have a free disk partition without any file system. My test code is below.
I expect to get output read data: 199 when I run the program at the second time. But actually, I get output read data: 0 twice. No errors emerge during the program. I have no idea where is wrong.
Thanks for your time.
Test Code:
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
int main(){
int num = 0;
int fd = open("/dev/sda6", O_RDWR);
if(fd == -1){
fprintf(stderr, "open device failed, errno : %s(%d) \n",
strerror(errno), errno);
return 1;
}
ssize_t ret = read(fd, &num, sizeof(int));
if(ret != sizeof(int)){
fprintf(stderr, "read fails, errno : %s(%d) \n",
strerror(errno), errno);
return 1;
}
printf("read data: %d\n", num);
num = 199;
ret = write(fd, &num, sizeof(int));
if(ret != sizeof(int)){
fprintf(stderr, "write fails, errno : %s(%d) \n",
strerror(errno), errno);
return 1;
}
close(fd);
return 0;
}
The read and write start reading/writing at the implicit file offset stored in the descriptor, and increment it by the number of bytes read/written. Therefore, you would now read bytes 0 .. 3, and then write bytes 4 .. 7.
Instead of read and write and messing with lseek et al, use the POSIX standard pread and pwrite that do not use the implicit file offset in the descriptor but take explicit file offsets from the beginning of a file in the call.
#include <unistd.h>
ssize_t pread(int fd, void *buf, size_t count, off_t offset);
ssize_t pwrite(int fd, const void *buf, size_t count, off_t offset);
...
ssize_t ret = pread(fd, &num, sizeof(int), 0);
ssize_t ret = pwrite(fd, &num, sizeof(int), 0);
You do not seek in your program, so what it does:
Read the first 4 bytes of the device, then write the second 4 bytes.
Try
lseek(fd,0,SEEK_SET);
before write if you want to write at the beginning of fole.
I know this way of copying files, which I think is pretty much standard way of copying files in C.
#include <stdio.h>
#include <stdlib.h>
int main()
{
char ch, source_file[20], target_file[20];
FILE *source, *target;
printf("Enter name of file to copy\n");
gets(source_file);
source = fopen(source_file, "r");
if( source == NULL )
{
printf("Press any key to exit...\n");
exit(EXIT_FAILURE);
}
printf("Enter name of target file\n");
gets(target_file);
target = fopen(target_file, "w");
if( target == NULL )
{
fclose(source);
printf("Press any key to exit...\n");
exit(EXIT_FAILURE);
}
while( ( ch = fgetc(source) ) != EOF )
fputc(ch, target);
printf("File copied successfully.\n");
fclose(source);
fclose(target);
return 0;
But this way opens the file and copies line by line. The files I want to copy are HUGE and many. This way will take very VERY LONG. Is there a way I can achieve my goal of copying these files directly. I know terminal or command prompt are completely different things than C language, but a simple
cp sourcefile.txt destinationfile.txt
can do the trick.
Is there any such commands or tricks in C that I can use. I cannot use
system("cp sourcefile.txt destinationfile.txt");
command because I am writing a robust program that should work in Linux and windows.
Well, what do you imagine the cp command itself do for copying files? If opens source file in read mode, destination file is write mode and copy everything by binary chunks! Ok more things can be involved if you pass other options to cp, but the copy itself is not more magic than that.
That being said, what you do is not that. You are copying the file character by character. Even if the standard library does some buffering, you are repeatedly calling an function when it could be avoided. And... never use gets. It is deprecated for ages because it is unsecure. If the user enters looong file names (more than 19 characters) you get a buffer overflow. And do not forget to test all io functions including output ones. When writing a huge file on a external media such an USB key, you could get out of space on device, and you program would just say it could successfully do the copy.
Copying loop could be something like:
#define SIZE 16384
char buffer[SIZE];
int crin, crout = 0;
while ((crin = fread(buffer, 1, SIZE, source)) > 0) {
crout = fwrite(buffer, 1, crin, target);
if (crout != crin) { /* control everything could be written */
perror("Write error");
crout = -1;
break;
}
if (crin < 0) { /* test read error (removal of amovible media, ...) */
perror("Read error");
}
A low level optimization here would be to directly use posix functions instead of standard library ones, because as soon as you are using binary IO in big chunks, the buffering of the standard library gives no advantage, and you simply have its overhead.
This is how i have moved a file in the past without having to open it:
#include <stdio.h>
int main()
{
rename("C:\\oldFile.txt", "C:\\newfile.txt");
return 0;
}
One thing to be aware is that you're copying the slowest possible way, because you're doing it character by character. One improvement would be to copy full lines or bigger text chunks, using fgets and fputs
Even better is to not copy the file as a text file, but instead just as a binary chunk. This is achieved by opening the file in binary mode with the b flag, so e.g. target = fopen(target_file, "wb"); and using fread and fwrite instead of the put character functions.
In both scenarios you have to use a temporary buffer with a reasonable size (could be the size of the file or fixed). To determine the optimal size is not trivial.
Yet another way to copy, and according to my operating systems professor what cp does, is by using memory mapped files.
How to use memory mapped files is unfortunately not portable, but depending on your operating system i.e. platform. For unix the manpage of mmap is your friend. This is an example unix implementation by me:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdint.h>
#include <errno.h>
#include <time.h>
#include <string.h>
#include <sys/shm.h>
#include <signal.h>
#include <stdbool.h>
#include <assert.h>
#include <sys/time.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, const char * argv[]) {
if (argc != 3)
{
fprintf(stderr, "Usage %s <SourceFile> <DestinationFile>\n",argv[0]);
return EXIT_FAILURE;
}
int source_file_desc = open(argv[1], O_RDONLY);
if (source_file_desc == -1) {
perror("Can't open source file");
return EXIT_FAILURE;
}
struct stat source_info;
if (stat(argv[1], &source_info) != 0) {
perror("Can't get source file infos");
return EXIT_FAILURE;
}
void *source_mem = mmap(NULL, source_info.st_size, PROT_READ, MAP_FILE|MAP_PRIVATE, source_file_desc, 0);
if (source_mem == MAP_FAILED) {
perror("Mapping source file failed");
return EXIT_FAILURE;
}
int destination_file_desc = open(argv[2], O_TRUNC|O_CREAT|O_RDWR);
if (destination_file_desc == -1) {
perror("Can't open destination file");
}
if (chmod(argv[2], source_info.st_mode) != 0) {
perror("Can't copy file permissions");
}
if (lseek(destination_file_desc, source_info.st_size-1, SEEK_SET) == -1) {
perror("Can'T seek to new end of destination file");
}
unsigned char dummy = 0;
if (write(destination_file_desc, &dummy, 1) == -1)
{
perror("Couldn't write dummy byte");
}
void *destination_mem = mmap(NULL, source_info.st_size, PROT_WRITE,MAP_FILE|MAP_SHARED, destination_file_desc,0);
if (destination_mem == MAP_FAILED) {
perror("Mapping destination file failed");
}
memcpy(destination_mem, source_mem, source_info.st_size);
munmap(source_mem,source_info.st_size);
munmap(destination_mem, source_info.st_size);
close(source_file_desc);
close(destination_file_desc);
return EXIT_SUCCESS;
}
If it’s not a problem that any changes to one copy would affect the other, you can create a link to the file. How this works depends on the OS.
If you want to optimize a file copy as much as possible using only the standard library, here is what I suggest (untested):
#include <errno.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
extern bool copy_file( FILE* dest, FILE* restrict src );
static bool error_helper( const char* file, int line, const char* msg );
#if defined(__amd64) || defined(__amd64__) || defined(__x86_64) || defined(__x86_64__) || defined(_M_X64) || defined(_M_AMD64) || defined(__i386) || defined(_M_IX86) || defined(_X86_) || defined(__X86__) || defined(__I86__) || defined(__INTEL__) || defined(__386)
# define PAGE_SIZE 4096U
#else
# error "Define the page size on your system, or use a system call such as sysconf() to find it."
#endif
#define non_fatal_stdlib_error() error_helper( __FILE__, __LINE__, strerror(errno) )
bool copy_file( FILE* dest, FILE* restrict src )
{
errno = 0;
if ( !(dest = freopen( NULL, "w+", dest )) )
return non_fatal_stdlib_error();
/* Try to help the library out by turning buffering off and allocating an aligned block; it might be able to detect that at runtime.
* On the other hand, the unbuffered implementation might be worse. */
setvbuf( src, NULL, _IONBF, BUFSIZ );
setvbuf( dest, NULL, _IONBF, BUFSIZ );
char* const buffer = aligned_alloc( PAGE_SIZE, PAGE_SIZE );
if (!buffer)
return non_fatal_stdlib_error();
size_t n = fread( buffer, 1, PAGE_SIZE, src );
while ( PAGE_SIZE == n ) {
const size_t written = fwrite( buffer, 1, PAGE_SIZE, dest );
if ( written != PAGE_SIZE )
return non_fatal_stdlib_error();
n = fread( buffer, 1, PAGE_SIZE, src );
} // end while
if (ferror(src))
return non_fatal_stdlib_error();
if ( n > 0 ) {
const size_t written = fwrite( buffer, 1, n, dest );
if ( written != n )
return non_fatal_stdlib_error();
}
return true;
}
bool error_helper( const char* file, int line, const char* msg )
{
fflush(stdout);
fprintf( stderr, "Error at %s, line %d: %s.\n", file, line, msg );
fflush(stderr);
return false;
}
This at least gives the library implementation a chance to detect that all reads and writes are single memory pages.
I was testing a code from APUE, in chapter 14(Advanced I/O) of memory map file, the fstat() always return the fdin's st_size as zero, and I tried stat() instead, and also get the same result. I list the code below(I have removed the apue.h dependencies):
#include <fcntl.h>
#include <sys/mman.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <sys/stat.h>
#include <unistd.h>
#define COPYINCR (1024*1024*1024) /* 1GB */
int main(int argc, char *argv[]) {
if (argc != 3) {
printf("usage: %s <fromfile> <tofile>", argv[0]);
exit(1);
}
int fdin, fdout;
if ((fdin = open(argv[1], O_RDONLY)) < 0) {
printf("can not open %s for reading", argv[1]);
exit(1);
}
if ((fdout = open(argv[2] /* typo fix */, O_RDONLY | O_CREAT | O_TRUNC)) < 0) {
printf("can not open %s for writing", argv[2]);
exit(1);
}
struct stat sbuf;
if (fstat(fdin, &sbuf) < 0) { /* need size fo input file */
printf("fstat error");
exit(1);
}
// always zero, and cause truncate error (parameter error)
printf("input_file size: %lld\n", (long long)sbuf.st_size);
if (ftruncate(fdout, sbuf.st_size) < 0) { /* set output file size */
printf("ftruncate error");
exit(1);
}
void *src, *dst;
off_t fsz = 0;
size_t copysz;
while (fsz < sbuf.st_size) {
if (sbuf.st_size - fsz > COPYINCR)
copysz = COPYINCR;
else
copysz = sbuf.st_size - fsz;
if (MAP_FAILED == (src = mmap(0, copysz, PROT_READ,
MAP_SHARED, fdin, fsz))) {
printf("mmap error for input\n");
exit(1);
}
if (MAP_FAILED == (dst = mmap(0, copysz,
PROT_READ | PROT_WRITE,
MAP_SHARED, fdout, fsz))) {
printf("mmap error for output\n");
exit(1);
}
memcpy(dst, src, copysz);
munmap(src, copysz);
munmap(dst, copysz);
fsz += copysz;
}
return 0;
}
And then I have tried the Python os.stat, it also get the zero result, why this happened? I have tried these and got the same result on Mac OS (Darwin kernel 13.4) and Ubuntu (kernel 3.13).
UPDATE:
Oh, there was a typo error, I should refer to fdout to argv[2], and the O_TRUNC flag certainly make the fdin to zero. Should I close or delete this question?
The reason why Python's os.stat() also return (stat.st_size == 0) is that I passed the same test file (argv[1]) to test, and the file has been previously truncated to zero (I haven't check its size using ls -lh before passing to os.stat()), and certainly os.stat() return zero.
Do not ask SO questions before you go to bed or in a rush.
Ok, the real problem is double open the same input file, and this does not cause any build or runtime error until the ftruncate().
The first open get a read-only fdin, the second open create a new file (fdout and truncated) to copy from fdin via memory map, and the second open truncated the first file (argv[1]), and cleaned all its content. But the fdin still working with fstat (and certainly), this make me hard to find the reason.
The second part is I always use the same file for testing (generated via dd) and have not checking the size, so the os.stat(/path/to/file) and stat(/path/to/file) also return st_size == 0, this makes me believe that this must be some os-level-prolicy defined the behaviour, and I rushed to Mac OS (using the same typo code), and got the same result (they really consistent on POSIX level, event the bug!), and at last, I came to SO for help.