Using MapViewOfFile, pointer eventually walks out of memory space - c

All,
I'm using MapViewOfFile to hold part of a file in memory. There is a stream that points to this file and writes to it, and then is rewound. I use the pointer to the beginning of the mapped file, and read until I get to the null char I write as the final character.
int fd;
yyout = tmpfile();
fd = fileno(yyout);
#ifdef WIN32
HANDLE fm;
HANDLE h = (HANDLE) _get_osfhandle (fd);
fm = CreateFileMapping(
h,
NULL,
PAGE_READWRITE|SEC_RESERVE,
0,
4096,
NULL);
if (fm == NULL) {
fprintf (stderr, "%s: Couldn't access memory space! %s\n", argv[0], strerror (GetLastError()));
exit(GetLastError());
}
bp = (char*)MapViewOfFile(
fm,
FILE_MAP_ALL_ACCESS,
0,
0,
0);
if (bp == NULL) {
fprintf (stderr, "%s: Couldn't fill memory space! %s\n", argv[0], strerror (GetLastError()));
exit(GetLastError());
}
Data is sent to the yyout stream, until flushData() is called. This writes a null to the stream, flushes, and then rewinds the stream. Then I start from the beginning of the mapped memory, and read chars until I get to the null.
void flushData(void) {
/* write out data in the stream and reset */
fprintf(yyout, "%c%c%c", 13, 10, '\0');
fflush(yyout);
rewind(yyout);
if (faqLine == 1) {
faqLine = 0; /* don't print faq's to the data file */
}
else {
char * ps = bp;
while (*ps != '\0') {
fprintf(outstream, "%c%c", *ps, blank);
ps++;
}
fflush(outfile);
}
fflush(yyout);
rewind(yyout);
}
After flushing, more data is written to the stream, which should be set to the start of the memory area. As near as I can determine with gdb, the stream is not getting rewound, and eventually fills up the allocated space.
Since the stream points to the underlying file, this does not cause a problem initially. But, when I attempt to walk the memory, I never find the null. This leads to a SIGSEV. If you want more details of why I need this, see here.
Why am I not reusing the memory space as expected?

I think this line from the MSDN documentation for CreateFileMapping might be the clue.
A mapped file and a file that is accessed by using the input and output (I/O) functions (ReadFile and WriteFile) are not necessarily coherent.
You're not apparently using Read/WriteFile, but the documentation should be understood in terms of mapped views versus explicit I/O calls. In any case, the C RTL is surely implemented using the Win32 API.
In short, this approach is problematic.
I don't know why changing the view/file size helps; perhaps it just shifts the undefined behaviour in a direction that happens to be beneficial.

Well, after working on this for a while, I have a working solution. I don't know why this succeeds, so if someone comes up with something better, I'll be happy to accept their answer instead.
fm = CreateFileMapping(
h,
NULL,
PAGE_READWRITE|SEC_RESERVE,
0,
16384,
NULL);
As you can see, the only change is to the size declared from 4096 to 16384. Why this works when the total chars input at a time is no more than 1200, I don't know. If someone could provide details on this, I would appreciate it.

When you're done with the map, simply un-map it.
UnmapViewOfFile(bp);

Related

Use mmap to put content into an allocated memory region

I have tried to read documentation on mmap but I am still having a hard time understanding how to use it.
I want to take an argument from the command line and then allocate it to an executable memory region. Then I want to be able to execute from that code.
This is what I have so far:
int main(int argc, char const *argv[]) {
if (argc != 2) {
printf("Correct input was not provided\n");
exit(1);
}
char assembly_code[sizeof argv[1]];
const char *in_value = argv[1];
int x = sscanf(in_value, "%02hhx", assembly_code);
if (x != 1) {
printf("sscanf failed, exited\n");
exit(1);
}
void * map;
size_t ac_size = sizeof(assembly_code) / sizeof(assembly_code[0]);
map = mmap(NULL, ac_size, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_ANONYMOUS, -1, 0);
if (map == MAP_FAILED) {
printf("Mapping failed\n");
exit(1);
}
((void (*)(void))map)();
return 0;
}
This is the output/error I am getting: Mapping failed
I don't know if I am using mmap correctly. And if I am I don't believe I am executing it correctly.
For example if this file is run with an argument e8200000004889c64831c048ffc04889c74831d2b2040f054831c0b83c0000004831ff0f05e806000000584883c008c3ebf84869210a it should return Hi! and then terminate. I don't really know how to get this output after the map or how to "call/execute" a mmap.
int x = sscanf(read, "%02hhx", assembly_code);
This only converts a single byte. I don't think there is a standard library function to convert a whole string of hex bytes; you'll have to call sscanf in a loop, incrementing pointers within the input buffer read and the output buffer assembly_code as appropriate.
In your mmap call, if you don't want to map a file, you should use the flag MAP_ANONYMOUS instead of MAP_SHARED. You should also check its return value (it returns MAP_FAILED on error) and report any error. Also, the return value is in principle of type void * instead of int * (though this should make no difference on common Unix systems).
The mmap call (after being fixed) will return a pointer to a block of memory filled with zeros. There is no reason for it to contain your assembly_code, so you'll have to copy it there, perhaps with memcpy. Or you could move your loop and parse the hex bytes directly into the region allocated with mmap.
Your ((void (*)(void))map)(); to transfer control to the mapped address appears to be correct, so once you have the block correctly mapped and populated, it should execute your machine code. I haven't tried to disassemble the hex string you provided to see if it actually would do what you are expecting, but hopefully it does.
Also, read is not a very good name for a variable, since there is also a standard library function by that name. Your code will work, since the local variable shadows the global object, but it would be clearer to choose another name.
Oh, and char assembly_code[sizeof argv[1]]; isn't right. Think carefully about what sizeof does and doesn't do.

Fread Abort 6 error

In my code I am sending sending packets each with a 128 bytes from the text file and need to read in data from a text file (I can't just allocated a buffer and read all of it before sending because the file will be extremely large). For some reason I am getting an Abort 6 error even when I have allocated memory.
SendIndex starts as 0 and it aborts for the first send so that shouldn't be the problem.
The problem occurs during strcpy I just don't know why.
Really confused so I would really appreciate the help.
struct packet packingT;
packingT.header = mpHeaderT;
packingT.data = (char*) calloc(512,sizeof(char));
char* sendString = (char*)calloc(128,sizeof(char));
FILE *file = fopen(receivedStruct->fileTitle, "rb");
if(file == NULL) {
printf("Error - Can't Open File\n");
exit(0);
}
fseek(file, 128*sendIndex, SEEK_SET);
fread(sendString, 128, 1,file);
fclose(file);
// sendString[128] = '\0'; <--- Still don't know if this is needed
packingT.header->seq_num = receivedStruct->nextSeqNum;
strcpy(packingT.data, sendString);
I think all you need to do is replace the final strcpy with memcpy instead. That is, the last line should be memcpy(packingT.data, sendString, 128);
(Edit: The reason being that strcpy determines the length of the thing to be copied by scanning for a zero at the end. You're reading arbitrary data, which may have zeros in the middle, and may not always end in a zero)
(Edit2: please be aware that the content of packingT.data is not terminated, so you can't use string functions on it. Depending on what you're doing, you might need to add a terminator, or ensure one gets written to the file)

Parse This IP To Be The Right Length?

For something I am doing I would like to get the external IP of the PC running the program (written in C). So far I have found the best way is to connect to a site that simply displays the IP of the visitor, and then parse the webpage for the IP. The first part was easy, but when I display the buffer I read the page (which only visibly consisted of my IP) I get a few random extra symbols/characters after the IP. Here is the code I am using ATM (simplified to exclude other stuff):
HINTERNET OpenInternet = NULL;
HINTERNET GetIP = NULL;
DWORD BytesRead = 0;
char IPGrabbed[30];
OpenInternet = InternetOpen("Microsoft Internet Explorer", INTERNET_OPEN_TYPE_DIRECT, NULL, NULL, 0);
if (OpenInternet == NULL) {
return 1;
}
GetIP = InternetOpenUrl(OpenInternet, "http://api.externalip.net/ip/", NULL, 0, INTERNET_FLAG_RELOAD, 0);
if (GetIP == NULL)
return 1;
if (!InternetReadFile(GetIP, &IPGrabbed, sizeof(IPGrabbed), &BytesRead))
return 1;
printf("IP: %s", IPGrabbed);
getchar();
I also tried parsing through IPGrabbed stopping at any '\n' or '\r' (because it displays the weird characters on the line below the IP when I printf() it) and then copying everything up till there to another char array, but got the same result. Could anyone help me figure out what is going on here? Thank you.
Initialise the buffer to all 0s and then read one character less then the buffer to read into provides.
This way the 0-terminator a C-"string" relies on is provided implicitly.
char IPGrabbed[30] = ""; /* Initialise the buffer to all `0`s ... */
[...]
/* ... and then read one character less then the buffer to read into provides. */
if (!InternetReadFile(GetIP, &IPGrabbed, sizeof(IPGrabbed) - 1, &BytesRead))
return 1;
fprintf(stderr, "IP: %s", IPGrabbed); /* Print to stderr, as it's not buffered so
everything appear immediately to the console. */
The result from InternetReadFile is not null-terminated, you need to add a null character to the end of the string by code after the read is successful:
IPGrabbed[BytesRead] = 0;
Edit 1
As suggested in the comment by Jonathan Potter, the above code may be subjected to a buffer overflow error if the site being accessed is returning anything longer than a IP string (maximum 16 characters).
Suggest to change the InternetReadFile to read 1 less of the buffer length instead of full buffer length to eliminate the above problem.
InternetReadFile(GetIP, &IPGrabbed, sizeof(IPGrabbed)-1, &BytesRead)

C - Inaccurate write() position obtained with lseek() when using multiple threads

I'm having a problem getting the correct file position at which I'm writing when simultaneously writing to different parts of the same file using multiple threads.
I have one global file descriptor to the file. In my writing function, I
first lock a mutex, then do lseek(global_fd, 0, SEEK_CUR) to get the current file
position. I next write 31 zero bytes (31 is my entry size) using write(), in effect to reserve space for later. I then unlock the mutex.
Later in the function, I declare a local fd variable to the same file, and open
it. I now do an lseek on that local fd to get to the position I learned from
earlier, where my space is reserved. Finally, I write() 31 data bytes there for
the entry, and close the local fd.
The issue seems to be that rarely, an entry doesn't get written to the expected location (it's not mangled data - it seems that either it is swapped with a different entry, or two entries were written to the same location). There are multiple threads running that
"writing function" I described.
I since learned that pwrite() can be used to write to a specific offset, which would be more efficient, and eliminate the lseek(). However, I first want to find out: what is wrong with my original algorithm? Is there any type of buffering that could be causing the discrepancy between the expected write location, and where the data actually ends up getting stored in the file?
The relevant code snippet is below. The reason this is an issue is that in a second data file, I record the location where the entry I'm writing will be stored. If that location, based on the lseek() before the write, is not accurate, my data doesn't match up properly -- which is what happens on occasion (it's hard to reproduce - it happens in maybe 1 in 100k writes). Thanks!
db_entry_add(...)
{
char dbrecord[DB_ENTRY_SIZE];
int retval;
pthread_mutex_lock(&db_mutex);
/* determine the EOF index, at which we will add the log entry */
off_t ndb_offset = lseek(cfg.curr_fd, 0, SEEK_CUR);
if (ndb_offset == -1)
{
fprintf(stderr, "Unable to determine ndb offset: %s\n", strerror_s(errno, ebuf, sizeof(ebuf)));
pthread_mutex_unlock(&db_mutex);
return 0;
}
/* reserve entry-size bytes at the location, at which we will
later add the log entry */
memset(dbrecord, 0, sizeof(dbrecord));
/* note: db_write() is a write() loop */
if (db_write(cfg.curr_fd, (char *) &dbrecord, DB_ENTRY_SIZE) < 0)
{
fprintf(stderr, "db_entry_add2db - db_write failed!");
close(curr_fd);
pthread_mutex_unlock(&db_mutex);
return 0;
}
pthread_mutex_unlock(&db_mutex);
/* in another data file, we now record that the entry we're going to write
will be at the specified location. if it's not (which is the problem,
on rare occasion), our data will be inconsistent */
advertise_entry_location(ndb_offset);
...
/* open the data file */
int write_fd = open(path, O_CREAT|O_LARGEFILE|O_WRONLY, 0644);
if (write_fd < 0)
{
fprintf(stderr, "%s: Unable to open file %s: %s\n", __func__, cfg.curr_silo_db_path, strerror_s(errno, ebuf, sizeof(ebuf)));
return 0;
}
pthread_mutex_lock(&db_mutex);
/* seek to our reserved write location */
if (lseek(write_fd, ndb_offset, SEEK_SET) == -1)
{
fprintf(stderr, "%s: lseek failed: %s\n", __func__, strerror_s(errno, ebuf, sizeof(ebuf)));
close(write_fd);
return 0;
}
pthread_mutex_unlock(&db_mutex);
/* write the entry */
/* note: db_write_with_mutex is a write() loop wrapped with db_mutex lock and unlock */
if (db_write_with_mutex(write_fd, (char *) &dbrecord, DB_ENTRY_SIZE) < 0)
{
fprintf(stderr, "db_entry_add2db - db_write failed!");
close(write_fd);
return 0;
}
/* close the data file */
close(write_fd);
return 1;
}
One more note, for completeness. I have a similar but simpler routine that could also be causing the problem. This one uses buffered output (FILE*, fopen, fwrite), but performs an fflush() at the end of each write. It writes to a different file than the earlier routine, but could cause the same symptom.
pthread_mutex_lock(&data_mutex);
/* determine the offset at which the data will be written. this has to be accurate,
otherwise it could be causing the problem */
offset = ftell(current_fp);
fwrite(data);
fflush(current_fp);
pthread_mutex_unlock(&data_mutex);
There seem to be several places where things could go wrong. I would make the following changes: (1) be consistent and use the same I/O library as per bdonlan's suggestion, (2) make the lseek() and the writes an atomic action guarded by a mutex so that only a single thread at a time can do those actions of adding to both files. SEEK_CUR does a seek based on the current location of the file offset pointer so would you not want SEEK_END to seek to the end of the file in order to append there? Then if you are modifying a particular section of the file you would use SEEK_SET to reposition to the location you want to write to. And you would want to do this in a mutex guarded section so as to allow only a single thread to do the file positioning and file update.
If you're using your 'simpler routine' at the same time, this could indeed be a problem. If these are separate file descriptors, there's nothing to ensure that they're both pointing at the end of the file at all times (unless you use append mode, however I'm not sure what the semantics around ftell for append mode are). If they're the same fd (ie, you have a raw fd and a FILE * pointing to the same place), you might have problems with the standard library getting confused about where you are in a file, when you use write() to bypass it.

Reading a file of text into array with malloc in C

I could use a set of eyes (or more) on this code. I'm trying to read in a set amount of bytes from a filestream (f1) to an array/buffer (file is a text file, array is of char type). If I read in size "buffer - 1" I want to "realloc" the array and the continue to read, starting at where I left off. Basically I'm trying to dynamically expand the buffer for the file of unknown size. What I'm wondering:
Am I implementing this wrong?
How would I check failure conditions on something like "realloc"
with the code the way it is?
I'm getting a lot of warnings when I compile about "implicit declaration of built-in function realloc..." (I'm seeing that warning for my use of read, malloc, strlen, etc. as well.
When "read()" get's called a second time (and third, fourth, etc.) does it read from the beginning of the stream each time? That could be my issue is I only seem to return the first "buff_size" char's.
Here's the snippet:
//read_buffer is of size buff_size
n_read = read(f1, read_buffer, buff_size - 1);
read_count = n_read;
int new_size = buff_size;
while (read_count == (buff_size - 1))
{
new_size *= 2;
read_buffer = realloc(read_buffer, new_size);
n_read = read(f1, read_buffer[read_count], buff_size - 1);
read_count += n_read;
}
As I am learning how to do this type of dynamic read, I'm wondering if someone could state a few brief facts about best practices with this sort of thing. I'm assuming this comes up a TON in the professional world (reading files of unknown size)? Thanks for your time. ALSO: As you guys find good ways of doing things (ie a technique for this type of problem), do you find yourselves memorizing how you did it, or maybe saving it to reference in the future (ie is a solution fairly static)?
If you're going to expand the buffer for the entire file anyway, it's probably easiest to seek to the end, get the current offset, then seek back to the beginning and read in swoop:
size = lseek(f1, 0, SEEK_END); // get offset at end of file
lseek(f1, 0, SEEK_SET); // seek back to beginning
buffer = malloc(size+1); // allocate enough memory.
read(f1, buffer, size); // read in the file
Alternatively, on any reasonably modern POSIX-like system, consider using mmap.
Here's a cool trick: use mmap instead (man mmap).
In a nutshell, say you have your file descriptor f1, on a file of nb bytes. You simply call
char *map = mmap(NULL, nb, PROT_READ, MAP_PRIVATE, f1, 0);
if (map == MAP_FAILED) {
return -1; // handle failure
}
Done.
You can read from the file as if it was already in memory, and the OS will read pages into memory as necessary. When you're done, you can simply call
munmap(map, nb);
and the mapping goes away.
edit: I just re-read your post and saw you don't know the file size. Why?
You can use lseek to seek to the end of the file and learn its current length.
If instead it's because someone else is writing to the file while you're reading, you can read from your current mapping until it runs out, then call lseek again to get the new length, and use mremap to increase the size. Or, you could simply munmap what you have, and mmap with a new "offset" (the number I set to 0, which is how many bytes from the file to skip).
#include <stdlib.h> /* for realloc() */
#include <string.h> /* for memcpy() */
#include <unistd.h> /* for read() */
char buff[512] ; /* anything goes */
size_t done, size;
char *result = NULL;
int fd;
done = size = 0;
while (1) {
int n_read;
n_read = read(fd, buff, sizeof buff);
if (n_read <=0) {
... for network connections, (n_read == -1 && errno == EAGAIN)
... should be handled special (by a continue) here.
break;
}
if (done+n_read > size) {
result = realloc(result, size ? 2*size : n_read );
... maybe handle NULL return from realloc here ...
size = size ? 2*size : n_read;
}
memcpy(result+done, buff, n_read);
done += n_read;
}
... and maybe shave down result a bit here ...
Note: this is more or less the vanilla way. Another way would be to malloc a real big array first, and realloc to the right size later. That will reduce the number of reallocs, and it might be more gentle for the malloc arena, wrt fragmentation. YMMV.

Resources