mmap file returns a pointer to an inaccessible place in memory - c

I have this program that is supposed to mmap a file in read-write mode and be able to edit its contents. Also the file this is written for is about 40-50 GB, so I need mmap64. The problem is, while mmap64 does not return an error, the address it returns is not accessible.
#include <assert.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <unistd.h>
typedef unsigned long long u64;
void access_test(u64 p, u64 sz)
{
u64 i;
char tmp;
for (i=0; i<sz; i++) {
tmp = *(char*)(p+i);
}
}
int main(int argc, char *argv[])
{
int fd;
long long int sz, p;
struct stat buf;
fd = open(argv[1], O_RDWR, 0x0666);
if (fd == -1) {
perror("open");
return 1;
}
fstat64(fd, &buf);
sz = buf.st_size;
printf("File size: 0x%016llx\n", sz);
p = mmap64 (0, buf.st_size, PROT_READ | PROT_WRITE , MAP_SHARED, fd, 0);
if (p == -1) {
perror ("mmap");
return 1;
}
access_test(p,sz);
if (close (fd) == -1) {
perror ("close");
return 1;
}
if (munmap ((void*)p, buf.st_size) == -1) {
perror ("munmap");
return 1;
}
return 0;
}
The result of this is on a small file:
$ ./testmmap minicom.log
File size: 0x0000000000000023
[1] 8282 segmentation fault (core dumped) ./testmmap minicom.log
The same goes for the big one.

Always enable warnings when you compile
Here is the result with warnings enabled:
$ gcc mmp.c -Wall -g
mmp.c: In function ‘access_test’:
mmp.c:18:10: warning: variable ‘tmp’ set but not used [-Wunused-but-set-variable]
char tmp;
^
mmp.c: In function ‘main’:
mmp.c:36:5: warning: implicit declaration of function ‘fstat64’ [-Wimplicit-function-declaration]
fstat64(fd, &buf);
^
mmp.c:40:5: warning: implicit declaration of function ‘mmap64’ [-Wimplicit-function-declaration]
p = mmap64 (0, buf.st_size, PROT_READ | PROT_WRITE , MAP_SHARED, fd, 0);
The last two warnings here are extremely important. They say there is no prototype for mmap64. C therefore gives you a default prototype, and it is wrong, at least for the mmap64() call (since the prototype will return an int, which cannot represent a pointer on a 64-bit Linux host)
The argument to fstat64() is a struct stat64 too BTW, which is another issue.
Make the specific 64-bit functions available
If you want to make the fstat64()/mmap64() function available, you need to compile the code with the _LARGEFILE and LARGEFILE64_SOURCE #define, see information here, so you should compile this as e.g:
gcc -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE mmp.c -Wall -g
Or use #define _FILE_OFFSET_BITS=64
There is however no need to do this. Just call the normal fstat() and mmap() and #define _FILE_OFFSET_BITS=64 when compiling. e.g.:
gcc -D_FILE_OFFSET_BITS=64 mmp.c -Wall -g
This will enable support for large files, and e.g. translate the mmap() call to mmap64() if it is needed (e.g. if you're on a 32-bit host).
If you are trying to mmap() an 50 GB file, you anyway need to be on a 64-bit host, and on a 64-bit Linux host there's no need for any of this - mmap() and fstat() handles large files without any need to do anything.
Use pointers
The next issue is you're assigning the return value of mmap() to an integer. This might happen to work, but the code does look odd because of it. If you want to treat the thing as a char *, assign it to a char *. Don't play tricks with casting pointers around to a 64-bit integer type.
E.g. your access function should be:
void access_test(char *p, u64 sz)
{
u64 i;
char tmp;
for (i=0; i<sz; i++) {
tmp = p[i];
}
}
And p should be declared as char *p; in main(), or use uint8_t *p; if you intend to treat the data as binary data.

Related

Looking for ways to 'mock' posix functions in C/C++ code

I am trying to find somewhat elegant ways to mock and stub function calls to the standard C library functions.
While stubbing-off calls to C files of the project is easy by just linking other C files in the tests, stubbing the standard C functions is harder.
They are just there when linking.
Currently, my approach is to include the code-under-test from my test.cpp file, and placing defines like this:
#include <stdio.h>
#include <gtest/gtest.h>
#include "mymocks.h"
CMockFile MockFile;
#define open MockFile.open
#define close MockFile.close
#define read MockFile.read
#include "CodeUnderTestClass.cpp"
#undef open
#undef close
#undef read
// test-class here
This is cumbersome, and sometimes I run across code that uses 'open' as member names elsewhere or causes other collisions and issues with it. There are also cases of the code needing different defines and includes than the test-code.
So are there alternatives? Some link-time tricks or runtime tricks to override standard C functions? I thought about run-time hooking the functions but that might go too far as usually binary code is loaded read-only.
My unit-tests run only on Debian-Linux with gcc on amd64. So gcc, x64 or Linux specific tricks are also welcome.
I know that rewriting all the code-under-test to use an abstracted version of the C functions is an option, but that hint is not very useful for me.
Use library preloading to substitute system libraries with your own.
Consider following test program code, mytest.c:
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
int main(void) {
char buf[256];
int fd = open("file", O_RDONLY);
if (fd >= 0) {
printf("fd == %d\n", fd);
int r = read(fd, buf, sizeof(buf));
write(0, buf, r);
close(fd);
} else {
printf("can't open file\n");
}
return 0;
}
It will open a file called file from the current directory, print it's descriptor number (usually 3), read its content and then print it on the standard output (descriptor 0).
Now here is your test library code, mock.c:
#include <string.h>
#include <unistd.h>
int open(const char *pathname, int flags) {
return 100;
}
int close(int fd) {
return 0;
}
ssize_t read(int fd, void *buf, size_t count) {
strcpy(buf, "TEST!\n");
return 7;
}
Compile it to a shared library called mock.so:
$ gcc -shared -fpic -o mock.so mock.c
If you compiled mytest.c to the mytest binary, run it with following command:
$ LD_PRELOAD=./mock.so ./mytest
You should see the output:
fd == 100
TEST!
Functions defined in mock.c were preloaded and used as a first match during the dynamic linking process, hence executing your code, and not the code from the system libraries.
Update:
If you want to use "original" functions, you should extract them "by hand" from the proper shared library, using dlopen, dlmap and dlclose functions. Because I don't want to clutter previous example, here's the new one, the same as previous mock.c plus dynamic symbol loading stuff:
#include <stdio.h>
#include <dlfcn.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <gnu/lib-names.h>
// this declares this function to run before main()
static void startup(void) __attribute__ ((constructor));
// this declares this function to run after main()
static void cleanup(void) __attribute__ ((destructor));
static void *sDlHandler = NULL;
ssize_t (*real_write)(int fd, const void *buf, size_t count) = NULL;
void startup(void) {
char *vError;
sDlHandler = dlopen(LIBC_SO, RTLD_LAZY);
if (sDlHandler == NULL) {
fprintf(stderr, "%s\n", dlerror());
exit(EXIT_FAILURE);
}
real_write = (ssize_t (*)(int, const void *, size_t))dlsym(sDlHandler, "write");
vError = dlerror();
if (vError != NULL) {
fprintf(stderr, "%s\n", vError);
exit(EXIT_FAILURE);
}
}
void cleanup(void) {
dlclose(sDlHandler);
}
int open(const char *pathname, int flags) {
return 100;
}
int close(int fd) {
return 0;
}
ssize_t read(int fd, void *buf, size_t count) {
strcpy(buf, "TEST!\n");
return 7;
}
ssize_t write(int fd, const void *buf, size_t count) {
if (fd == 0) {
real_write(fd, "mock: ", 6);
}
real_write(fd, buf, count);
return count;
}
Compile it with:
$ gcc -shared -fpic -o mock.so mock.c -ldl
Note the -ldl at the end of the command.
So: startup function will run before main (so you don't need to put any initialization code in your original program) and initialize real_write to be the original write function. cleanup function will run after main, so you don't need to add any "cleaning" code at the end of main function either.
All the rest works exactly the same as in the previous example, with the exception of newly implemented write function. For almost all the descriptors it will work as the original, and for file descriptor 0 it will write some extra data before the original content. In that case the output of the program will be:
$ LD_PRELOAD=./mock.so ./mytest
fd == 100
mock: TEST!

unable to compile program due to function asprintf

Found this code, it needed to stop throttling the CPU to 20% in Dell laptops, which occurs due to the power adapter failing to be recognized by the computer.
Tried to compile on Kubuntu and got this:
warning: implicit declaration of function ‘asprintf’; did you mean ‘vasprintf’? [-Wimplicit-function-declaration]
47 | if (asprintf(&concat_cmd, "%s %i", cmd, *reg_value) == -1)
| ^~~~~~~~
| vasprintf
I don’t understand why it is happening. I read that asprintf is part of the libiberty-dev. The library is installed but everything does not work. Also I added
#include <libiberty/libiberty.h>
and got the same - implicit declaration of function ‘asprintf’
tell me what to do with it?
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <libiberty/libiberty.h>
#define BUFSIZE (64)
int get_msr_value(uint64_t *reg_value) {
const char *cmd = "rdmsr -u 0x1FC";
char cmd_buf[BUFSIZE];
FILE *fp;
if ((fp = popen(cmd, "r")) == NULL) {
printf("Error opening pipe!\n");
return -1;
}
cmd_buf[strcspn(fgets(cmd_buf, BUFSIZE, fp), "\n")] = 0;
*reg_value = atoi(cmd_buf);
if (pclose(fp)) {
printf("Command not found or exited with error status\n");
return -1;
}
return 0;
}
int main(void) {
const char *cmd = "wrmsr -a 0x1FC";
char *concat_cmd;
int ret;
uint64_t *reg_value = &(uint64_t){ 0 };
if ((ret = get_msr_value(reg_value))) {
return ret;
}
printf("Old register value: %lu\n", *reg_value);
*reg_value = *reg_value & 0xFFFFFFFE; // clear bit 0
printf("New register value: %lu\n", *reg_value);
if (asprintf(&concat_cmd, "%s %i", cmd, *reg_value) == -1)
return -1;
printf("Executing: %s\n", concat_cmd);
system(concat_cmd);
free(concat_cmd);
return 0;
}
asprintf is part of stdio.h, but you need to add #define _GNU_SOURCE at the top of your file and use -std=gnu99 when compiling.
The function asprintf() is not yet part of the C Standard. It is available in the GNU libc and most likely supported on your system since it uses this C library, with a declaration in <stdio.h>.
You might need to define __GNU_SOURCE or __USE_GNU before including <stdio.h> for this declaration to be parsed by the compiler. Run man asprintf to see which feature macro to use or look inside the file /usr/include/stdio.h on your system.
Either modify the source code or add a -D__GNU_SOURCE command line argument in your CFLAGS in the Makefile.

How to Override A C System Call?

So the problem is the following. The project needs to intercept all file IO
operations, like open() and close(). I am trying to add printf() before calling the corresponding open() or close(). I am not supposed to rewrite the source code by changing open() or close() to myOpen() or myClose() for example. I have been trying to use LD_PRELOAD environment variable. But the indefinite loop problem came up. My problem is like this one.
int open(char * path,int flags,int mode)
{
// print file name
printf("open :%s\n",path);
return __open(path,flags,mode);
}
Yes, you want LD_PRELOAD.
You need to create a shared library (.so) that has code for all functions that you want to intercept. And, you want to set LD_PRELOAD to use that shared library
Here is some sample code for the open function. You'll need to do something similar for each function you want to intercept:
#define _GNU_SOURCE
#include <dlfcn.h>
int
open(const char *file,int flags,int mode)
{
static int (*real_open)(const char *file,int flags,int mode) = NULL;
int fd;
if (real_open == NULL)
real_open = dlsym(RTLD_NEXT,"open");
// do whatever special stuff ...
fd = real_open(file,flags,mode);
// do whatever special stuff ...
return fd;
}
I believe RTLD_NEXT is easiest and may be sufficient. Otherwise, you could add a constructor that does dlopen once on libc
UPDATE:
I am not familiar with C and I got the following problems with gcc. "error: 'NULL' undeclared (first use in this function)",
This is defined by several #include files, so try #include <stdio.h>. You'll need that if you want to call printf.
"error: 'RTLD_NEXT' undeclared (first use in this function)",
That is defined by doing #include <dlfcn.h> [as shown in my example]
and "symbol lookup error: ./hack_stackoverflow.so: undefined symbol: dlsym".
From man dlsym, it says: Link with -ldl So, add -ldl to the line that builds your .so.
Also, you have to be careful to prevent infinite recursion if the "special stuff" does something that loops back on your intercept function.
Notably, you want to call printf. If you intercept the write syscall, bad things may happen.
So, you need to keep track of when you're already in one of your intercept functions and not do anything special if already there. See the in_self variable.
#define _GNU_SOURCE
#include <stdio.h>
#include <dlfcn.h>
ssize_t
write(int fd,const void *buf,size_t len)
{
static ssize_t (*real_write)(int fd,const void *buf,size_t len) = NULL;
static int in_self = 0;
ssize_t err;
if (real_write == NULL)
real_write = dlsym(RTLD_NEXT,"write");
++in_self;
if (in_self == 1)
printf("mywrite: fd=%d buf=%p len=%ld\n",fd,buf,len);
err = real_write(fd,buf,len);
if (in_self == 1)
printf("mywrite: fd=%d buf=%p err=%ld\n",fd,buf,err);
--in_self;
return err;
}
The above works okay for single threaded programs/environments, but if you're intercepting an arbitrary one, it could be multithreaded.
So, we'd have to initialize all the real_* pointers in a constructor. This is a function with a special attribute that tells the dynamic loader to call the function ASAP automatically.
And, we have to put in_self into thread local storage. We do this by adding the __thread attribute.
You may need to link with -lpthread as well as -ldl for the multithreaded version.
Edit: We also have to preserve the correct errno value
Putting it all together:
#define _GNU_SOURCE
#include <stdio.h>
#include <dlfcn.h>
#include <errno.h>
static int (*real_open)(const char *file,int flags,int mode) = NULL;
static ssize_t (*real_write)(int fd,const void *buf,size_t len) = NULL;
__attribute__((constructor))
void
my_lib_init(void)
{
real_open = dlsym(RTLD_NEXT,"open");
real_write = dlsym(RTLD_NEXT,"write");
}
int
open(const char *file,int flags,int mode)
{
int fd;
// do whatever special stuff ...
fd = real_open(file,flags,mode);
// do whatever special stuff ...
return fd;
}
ssize_t
write(int fd,const void *buf,size_t len)
{
static int __thread in_self = 0;
int sverr;
ssize_t ret;
++in_self;
if (in_self == 1)
printf("mywrite: fd=%d buf=%p len=%ld\n",fd,buf,len);
ret = real_write(fd,buf,len);
// preserve errno value for actual syscall -- otherwise, errno may
// be set by the following printf and _caller_ will get the _wrong_
// errno value
sverr = errno;
if (in_self == 1)
printf("mywrite: fd=%d buf=%p ret=%ld\n",fd,buf,ret);
--in_self;
// restore correct errno value for write syscall
errno = sverr;
return ret;
}

error opening file for reading: Value too large for defined data type

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <iostream>
using namespace std;
#define FILEPATH "file.txt"
#define NUMINTS (268435455)
#define FILESIZE (NUMINTS * sizeof(int))
int main()
{
int i=sizeof(int);
int fd;
double *map; //mmapped array of int's
fd = open(FILEPATH, O_RDONLY);
if (fd == -1) {
perror("Error opening file for reading");
exit(EXIT_FAILURE);
}
map = (double*)mmap(0, FILESIZE, PROT_READ, MAP_SHARED, fd, 0);
if (map == MAP_FAILED) {
close(fd);
perror("Error mmapping the file");
exit(EXIT_FAILURE);
}
for (i = 100000; i <=100100; ++i) {
cout<<map[i]<<endl;
}
if (munmap(map, FILESIZE) == -1) {
perror("Error un-mmapping the file");
}
close(fd);
return 0;
}
I am getting error that file size is too large.
You should check that your mmap-ed file is large enough.
And make sure FILESIZE is a int64_t number (you need #include <stdint.h>):
#define FILESIZE ((int64_t)NUMINTS * sizeof(int))
Before your mmap call and after the successful open, add the following code, using fstat(2):
struct stat st={0};
if (fstat(fd, &st)) { perror("fstat"); exit (EXIT_FAILURE); };
if ((int64_t)st.st_size < FILESIZE) {
fprintf(stderr, "file %s has size %lld but need %lld bytes\n",
FILEPATH, (long long) st.st_size, (long long) FILESIZE);
exit(EXIT_FAILURE);
}
At last, compile your program with g++ -Wall -g and use the gdb debugger. Also, strace(1) could be useful. And be sure that the file system for the current directory can deal with large files.
You may want or need to define _LARGEFILE64_SOURCE (and/or _GNU_SOURCE) e.g. compile with g++ -Wall -g -D_LARGEFILE64_SOURCE=1 ; see lseek64(3) & feature_test_macros(7)
addenda
Googling for
Value too large for defined data type
gives quite quickly this coreutils FAQ with a detailed explanation. You probably should install a 64 bits Linux distribution (or at least recompile your coreutils appropriately configured, or use a different file system...)
I had this when I was trying to process a 2,626,351,763 byte file (which is more than fits in a signed 32 bit integer). The problem went away when I recompiling my program with cc -m64 (I am using the Sun C 5.13 SunOS_sparc 2014/10/20 compiler).
The 64 bit system is happy to deal with big (>2^32 byte) files but if the application is compiled in 32 bit mode, not so much.

C: stdin and std* errs

I want to manipulate Stdin and then Std*. But I am getting the following errors,
$ gcc testFd.c
testFd.c:9: error: initializer element is not constant
testFd.c:9: warning: data definition has no type or storage class
testFd.c:10: error: redefinition of `fd'
testFd.c:9: error: `fd' previously defined here
testFd.c:10: error: `mode' undeclared here (not in a function)
testFd.c:10: error: initializer element is not constant
testFd.c:10: warning: data definition has no type or storage class
testFd.c:12: error: syntax error before string constant
The program is shown below.
#include <stdio.h>
#include <sys/ioctl.h>
int STDIN_FILENO = 1;
// I want to access typed
// Shell commands, dunno about the value:
unsigned long F_DUPFD;
fd = fcntl(STDIN_FILENO, F_DUPFD, 0);
fd = open("/dev/fd/0", mode);
printf("STDIN = %s", fd);
Updated Errors: just trying to get an example program about file descriptors to work in C, pretty lost with the err report
#include <stdio.h>
#include <sys/ioctl.h>
int main (void) {
int STDIN_FILENO;
// I want to access typed
// Shell commands, dunno about the value:
unsigned long F_DUPFD;
int fd;
const char mode = 'r';
fd = fcntl(STDIN_FILENO, F_DUPFD, 0);
/* also, did you mean `fopen'? */
fd = fopen("/dev/fd/0", mode);
printf("STDIN = %s", fd);
return 0;
}
The program execution is shown below.
$ gcc testFd.c
testFd.c: In function `main':
testFd.c:14: warning: passing arg 2 of `fopen' makes pointer from integer without a cast
testFd.c:14: warning: assignment makes integer from pointer without a cast
Try using a main method:
#include <stdio.h>
#include <sys/ioctl.h>
int main (void) {
int STDIN_FILENO = 1;
// I want to access typed
// Shell commands, dunno about the value:
unsigned long F_DUPFD;
/* also, declare the type of your variable "fd" */
int fd;
fd = fcntl(STDIN_FILENO, F_DUPFD, 0);
/* also, did you mean `fopen'? */
fd = open("/dev/fd/0", mode);
printf("STDIN = %s", fd);
return 0;
}
You forgot your main() function!!
Where's your definition of main()?
Quite apart from the fact that you don't have a main() function, your entire approach is wrong. STDIN_FILENO is a constant; assigning to it doesn't make any sense.
Try explaining what you actually want to do, with some detail, and we will be able to suggest how to go about it.

Resources