I've been tasked with updating a function which currently reads in a configuration file from disk and populates a structure:
static int LoadFromFile(FILE *Stream, ConfigStructure *cs)
{
int tempInt;
...
if ( fscanf( Stream, "Version: %d\n",&tempInt) != 1 )
{
printf("Unable to read version number\n");
return 0;
}
cs->Version = tempInt;
...
}
to one which allows us to bypass writing the configuration to disk and instead pass it directly in memory, roughly equivalent to this:
static int LoadFromString(char *Stream, ConfigStructure *cs)
A few things to note:
The current LoadFromFile function is incredibly dense and complex, reading dozens of versions of the config file in a backward compatible manner, which makes duplication of the overall logic quite a pain.
The functions that generate the config file and those that read it originate in totally different parts of the old system and therefore don't share any data structures so I can't pass those directly. I could potentially write a wrapper, but again, it would need to handle any structure passed in in a backwards compatible manner.
I'm tempted to just pass the file as is in as a string (as in the prototype above) and convert all the fscanf's to sscanf's but then I have to handle incrementing the pointer along (and potentially dealing with buffer overrun errors) manually.
This has to remain in C, so no C++ functionality like streams can help here
Am I missing a better option? Is there some way to create a FILE * that actually just points to a location in memory instead of on disk? Any pointers, suggestions or other help is greatly appreciated.
If you can't pass structures and must pass the data as a string, then you should be able to tweak your function to read from a string instead of a file. If the function is as complicated as you describe, then converting fscanf->sscanf would possibly be the most straightforward way to go.
Here's an idea using your function prototype above. Read in the entire data string (without processing any of it) and store it in a local buffer. That way the code can have random access to the data as it can with a file and making buffer overruns easier to predict and avoid. Start by mallocing a reasonably-sized buffer, copy data into it, and realloc yourself more space as needed. Once you have a local copy of the entire data buffer, scan through it and extract whatever data you need.
Note that this might get tricky if '\0' characters are valid input. In that case, you would have to add additional logic to test if this was the end of the input string or just a zero byte (difficulty depending on the particular format of your data buffer).
Since you are trying to keep the file data in memory, you should be able to use shared memory. POSIX shared memory is actually a variation of mapped memory. The shared memory object can be mapped into the process address space using mmap() if necessary. Shared memory is usually used as an IPC mechanism, but you should be able to use it for your situation.
The following example code uses POSIX shared memory (shm_open() & shm_unlink()) in conjunction with a FILE * to write text to the shared memory object and then read it back.
#include <fcntl.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/types.h>
#define MAX_LEN 1024
int main(int argc, char ** argv)
{
int fd;
FILE * fp;
char * buf[MAX_LEN];
fd = shm_open("/test", O_CREAT | O_RDWR, 0600);
ftruncate(fd, MAX_LEN);
fp = fdopen(fd, "r+");
fprintf(fp, "Hello_World!\n");
rewind(fp);
fscanf(fp, "%s", buf);
fprintf(stdout, "%s\n", buf);
fclose(fp);
shm_unlink("/test");
return 0;
}
Note: I had to pass -lrt to the linker when compiling this example with gcc on Linux.
Use mkstemp() to create a temporary file. It takes char * as argument and uses it as a template for the file's name. But, it will return a file descriptor.
Use tmpfile(). It returns FILE *, but has some security issues and also, you have to copy the string yourself.
Use mmap() ( Beej's Guide for help)
Related
As far as I understand passing a pointer to a function essentially passes the copy of the pointer to the function in C. I have a FILE pointer that I pass to a function func(), func() reads a line from a file and then when we return to main(). I read another line from the file using the same FILE pointer.
However, while I would imagine that I'd read the line exactly from before func() was called, I actually read the next line after what func() had read. Can you please explain why FILE pointer behaves this way?
This is my code:
#include <stdio.h>
#define STR_LEN 22
void func(FILE *fd);
int main() {
FILE *fd;
char mainString[STR_LEN];
if (!(fd = fopen("inpuFile", "r"))) {
printf("Couldn't open file\n");
fprintf(stderr, "Couldn't open file\n");
}
func(fd);
fgets(mainString, STR_LEN, fd);
printf("mainString = %s\n", mainString);
fclose(fd);
return 0;
}
void func(FILE *fd) {
char funcString[STR_LEN];
fgets(funcString,STR_LEN, fd);
printf("funcString = %s\n", funcString);
}
However, while I would imagine that I'd read the line exactly from before func was called ...
I can't imagine why you would imagine that. What if the FILE* references a network connection that has no replay capability at all where reading is consumption. Where would the line be stored such that you could read it again? There would be absolutely no place to put it.
Not only would I not imagine that, it's kind of crazy.
As far as I understand passing a pointer to a function essentially passes the copy of the pointer to the function in C.
Correct. But a copy of a pointer points to the very same object. If I point to a car and you copy me, you're pointing to the very same one and only car that I'm pointing to.
Because FILE pointer points on some data that gets changed when the file is read/written.
So the pointer doesn't change (still points to the handler structure of the file) but the data pointed by the structure does.
Try passing pointer as const FILE * you'll see that you cannot because fread operation (and others) alter the pointed data.
One way would be to duplicate the file descriptor, which dup does, but doesn't work on buffered FILE object, only raw file descriptors.
The problem is in your initial statement:
As far as I understand passing a pointer to a function essentially passes the copy of the pointer to the function in C.
This does not change much, as whatever you are accessing as a pointer, still holds the location of the FILE you are accessing, the whole point of using pointers as arguments for a function in C, is so that you can modify a certain value outside the scope of a function.
For example, common usage of an integer pointer as a function argument:
void DoSomethingCool(int *error);
Now using this code to catch the error would work like this:
int error = 0;
DoSomethingCool(&error);
if(error != 0)
printf("Something really bad happened!");
In other words, the pointer will actually modify the integer error, by accessing it's location and writing to it.
An important thing to keep in mind to avoid these kinds of misunderstandings is to recognize that all a pointer is, is essentially the address of something.
So you could (in theory, by simplifying everything a lot) think of an int * as simply an int, the value of which happens to be an address of some variable, for a FILE *, you can think of it as an int, where the value of the int is the location of the FILE variable.
FILE *fd is a pointer only in the sense that its implementation uses C construct called a "pointer". It is not a pointer in the sense of representing a file position.
FILE *fd represents a handle to a file object inside the I/O library, a struct that includes the actual position of the file. In a grossly simplified way, you can think of fd as a C pointer to a file pointer.
When you pass fd around your program, I/O routines make modifications to the file position. This position is shared among all users of fd. If a func() makes a change to that position by reading some data or by calling fseek, all users of the same fd will see the updated position.
I'm writing down a function that should save 3 structures (2 of them are arrays of structs) in a binary file. Here's my function:
void saveFile(Struct1 *s1, Struct2 *s2, Struct3 s3) {
FILE *fp = NULL;
fp = fopen("save.bin", "w+b");
if (fp == NULL) {
printf("Save failed.\n");
}
fwrite(s1, sizeof(Struct1), struct3.nElements, fp);
fwrite(s2, sizeof(Struct2), NELEMENTS, fp);
fwrite(&s3, sizeof(Struct3), 1, fp);
printf("Save done.\n");
}
s1 have struct3.nElements, s2 have NELEMENTS (that's a constant) and s3 is just one struct and not an array. When I try to open the save.bin with HexEditor it gives very different results from the ones I was expecting, I'm wondering if I used correctly the fwrite function, especially for array of structs.
There are small issues with you function that might cause problems:
you define the function as taking s3 by value. Why not pass a pointer to the third struct? Is the saveFile function properly declared before the calling code? Are you sure the calling code passes the struct by value?
You forget to close the stream. The handle gets lost, and the contents is not flushed to disk until the program exits.
You open the file in "w+b" mode: write with read. It is correct to use binary mode, but unnecessary to add the + for read. Just use "wb".
If fopen fails, you output a diagnostic message, but you do not return from the function. You will invoke undefined behavior when trying to write to a NULL stream pointer.
Regarding your question, the dump of the file does not correspond to what you expect... give us more information, such as the definitions of the different structures and the hex dump. Here are some ideas:
Some of the fields in the structures might need a specific aligned and thus be separated from the previous field by padding bytes. The values of those padding bytes is not necessarily 0: if the structures are in automatic storage or allocated with malloc, their initial state is undefined and can change as a side effect of storing other fields.
Integers can have different sizes and be stored in little endian or big endian order in the file, depending on the specific architecture your program is compiled for. For this reason, values stored by your program should only be read back with the appropriate, but reasonably similar code, running on the same architecture and OS.
If your structures contain pointers, you cannot really make sense from the values stored in the output file.
When reading K&R, I became interested in how the file position is determined. By file position, I mean where in the file the stream is currently reading or writing. I think it must have something to do with the file pointer, or the piece of data it's pointing to. So I checked stack overflow, and find the following answer:
Does fread move the file pointer?
The answer indicates that file pointer will change with the change of file position. This makes me very confused, because in my understanding, a file pointer for a certain file should always point to the same address, where information about this file is stored. So I wrote a small piece of code, trying to find the answer:
#include<stdio.h>
int main(void)
{
char s[1000];
FILE *fp,*fp1,*fp2;
fp = fopen("input","r");
fp1 = fp; /* File poiter before fread */
fread(s,sizeof(char),100,fp);
fp2 = fp; /* File pointer after fread */
printf("%d\n",(fp1 == fp2) ? 1 : -1);
}
It gives the output 1, which I believe indicates that the file pointer actually doesn't move and is still pointing to the same address. I have also changed the fread line to be a fseek, which gave the same output. So does file pointer move with the change of file position, or where am I wrong in the verifying process?
Thanks!
I think you are confusing the general concept of pointers in C, vs. the nomenclature of a "file pointer". FILE is just a structure that contains most of the "housekeeping" attributes that the C stdio runtime library needs to interact with when using the stdio functions such as, fopen(), fread(), etc. Here is an example of the structure:
typedef struct {
char *fpos; /* Current position of file pointer (absolute address) */
void *base; /* Pointer to the base of the file */
unsigned short handle; /* File handle */
short flags; /* Flags (see FileFlags) */
short unget; /* 1-byte buffer for ungetc (b15=1 if non-empty) */
unsigned long alloc; /* Number of currently allocated bytes for the file */
unsigned short buffincrement; /* Number of bytes allocated at once */
} FILE;
Note that this may be somewhat platform-dependent, so don't take it as gospel. So when you call fopen(), the underlying library function interacts with the O/S's file system APIs and caches relevant information about the file, buffer allocation, etc, in this structure. The fopen() function allocates memory for this structure, and then returns the address of that memory back to the caller in the form of a C Pointer.
Assigning the pointers values to another pointer has no effect on the attributes inside the FILE structure. However, the FILE structure, internally, may have indexes or "pointers" to the underlying O/S file. Hence, the confusion in terminology. Hope that helps.
You are right fp is never changed by fread, fseekor other f... functions. Except, of course, if you do fp = fopen(...), but then you are assigning the return value of fopen to fp and then fp changes of course.
Remember, in C parameters are passed by value, so fread cannot change it's value.
But fread does change the internal structure fp points to.
You made some confusion between a file pointer, under common definition, and the pointer in the file.
Normally with the term file pointer we refer to a pointer to a FILE structure. That structure contains all variables necessary to manage file access. This structure is created upon a successful opening of a file, and remains the same (same address) for all the time until you fclose() the file (when became undefined).
Inside the FILE structure there are many pointers that points to the file block on disk and to the position inside the current record. These pointers, managed by file I/O routines, changes when file is accessed (read or write).
And these pointers are that to which the answer you cited refers.
What about using an std::vector<char> or std::vector<unsigned char> as a FILE* argument when invoking a C function that expects to receive a pointer to a file ?
Personally I can't recall any object or element from the standard library that can be used as a C style file.
Why I want to do this:
get out the user space as soon as possible, so I quickly load everything into a vector
"centralize" memory management, since I use vectors a lot, I just use yet another vector for dealing with files
simplifies algorithms and functions, because of basically the same reasons as my previous point
On some platforms, the standard library contains functions, which can be used for that purpose. For example, on Linux the following two functions are available:
fmemopen: Create a FILE* from a char buffer.
fopencookie: Create a FILE* with custom functions.
According to the linked man pages, fmemopen is part of POSIX-2008 and fopencookie is a GNU extensions.
In a very limited sense, you can do it for reading only by using the fmemopen function *.
Instantiate a vector
Fill the vector with data as needed
Call fmemopen, passing it vect.data(), vect.size(), and the "r" flag
Use the resultant FILE * to pass to functions that need to read from a file.
* It does not mean that it is worth doing it in this way, though.
Definitely not - functions like fprintf expect to be able to dereference that FILE* and get a FILE, and a std::vector is definitely not a FILE. In glibc, a FILE is a typedef for something like this:
struct _IO_FILE {
int _flags; /* High-order word is _IO_MAGIC; rest is flags. */
#define _IO_file_flags _flags
/* The following pointers correspond to the C++ streambuf protocol. */
/* Note: Tk uses the _IO_read_ptr and _IO_read_end fields directly. */
char* _IO_read_ptr; /* Current read pointer */
char* _IO_read_end; /* End of get area. */
char* _IO_read_base; /* Start of putback+get area. */
char* _IO_write_base; /* Start of put area. */
char* _IO_write_ptr; /* Current put pointer. */
char* _IO_write_end; /* End of put area. */
char* _IO_buf_base; /* Start of reserve area. */
char* _IO_buf_end; /* End of reserve area. */
/* The following fields are used to support backing up and undo. */
char *_IO_save_base; /* Pointer to start of non-current get area. */
char *_IO_backup_base; /* Pointer to first valid character of backup area */
char *_IO_save_end; /* Pointer to end of non-current get area. */
struct _IO_marker *_markers;
struct _IO_FILE *_chain;
int _fileno;
#if 0
int _blksize;
#else
int _flags2;
#endif
_IO_off_t _old_offset; /* This used to be _offset but it's too small. */
#define __HAVE_COLUMN /* temporary */
/* 1+column number of pbase(); 0 is unknown. */
unsigned short _cur_column;
signed char _vtable_offset;
char _shortbuf[1];
/* char* _save_gptr; char* _save_egptr; */
_IO_lock_t *_lock;
#ifdef _IO_USE_OLD_IO_FILE
};
A FILE * and a std::vector<char> * are two different types that are not directly compatible.
An easy example of this is that a std::vector store in memory all of the data that the object uses. A FILE *, on the other hand, has a opaque integer that allows a function to request more data from the operating system.
So, if you were to do:
std::string get_line(FILE *) { ... }
And you called it as:
std::vector<char> v
std::string s = get_line((FILE *) &v);
I would expect your application to exhibit a large pile of undefined behavior.
Well, from your description, it seems like you want something that can be accessed either as a block of memory or as a (C-style) file stream; if so, it may be that, rather than multiple round-trips into a vector<> type, you might be better off using a memory mapped file in parallel with a shared-mode FILE*.
It's been a while since I had need to do this, but it looks like there's a nice, lightweight library on SourceForge called fmstream, which gives you a C++ wrapper (the latest source seems to include C++11 features; I would check to make sure that the released version does as well) called fmstream. (Also, ifmstream, for the read-only version.)
To use a memory mapped file, you will instantiate by opening it with the aforementioned class. This (I am assuming) will be an OS-level memory-mapping of the file; and, indeed, this gives you 'direct' (at least from your code's point of view) access to the file. There are a number of different modes for this, but basically it's just using your file as a section of virtual memory. Thus, should you make changes to the file, you are actually changing the file on disk, too—the lazy case being when the changes to the file are not written until you close the file, or command it to; but there should be immediate modes that keep the data more or less in-sync as you change the memory.
Not sure if writes are a specific part of your use-case, but using a map may still be advantageous, or not, even if you aren't doing any writing. The best way to tell is to do some prototyping using some data files that come as close as possible to replicating the types of load you'll have in processing a real file, and doing some performance testing. For the read-write case, you can use fmstream for a more performant read-only stream you can use ifmstream.
Anyhow, at the point you need a data pointer, you'll just call data() method on an instance of one of the streams, and access it as a large block of memory; and, as a memory-mapped file, it (should) lazy load the data as it's needed (remember it's an OS structure, that's just being manipulated in C++; so, most of the underlying stuff should be about as optimized as it could reasonably be for a general case) and not require loading a mass of stuff all at once.
For the C functions, you can just use the memory-mapped file in tandem with using a C FILE*, opened in shared mode, directly with your C functions; and, in fact, you can linkly just include the headers in an extern "C" { /* includes */ } block and link to the object code directly, assuming it's a static C library.
Once you're done with the file, all you'll have to do is clean up the stream (or let it automatically do so by using something like unique_pointer to instantiate it in the first place, so that scoping takes care of it.) If you're doing a lot of reads you might want to open it, save it somewhere, and close it later; if you're doing all your processing at once you might just want to open and close it in a function that wraps the functionality.
Hopefully that gives you more-or-less what you needed, without a lot of extra work (although I do recommend you do a bit of reading for you particular platform on how the memory mapping works at an OS level, so you don't get bitten by something working differently than you expected.)
CAVEAT: I haven't looked too closely at this specific implementation, and, as I say, it's been a while since I used this technique, but I assume things haven't changed that much. If you run into issues, or need some clarification; let me know and I'll try to find some time to dig a little further...
I'm learning about buffer overflows today and I came across many examples of programs which are vulnerable. The thing which makes me curious is, if there is any reason to work with program's arguments like this:
int main(int argc, char *argv[])
{
char argument_buffer[100];
strcpy(argument_buffer, argv[1]);
if(strcmp(argument_buffer, "testArg") == 0)
{
printf("Hello!\n");
}
// ...
}
Instead of simply:
int main(int argc, char *argv[])
{
if(strcmp(argv[1], "testArg") == 0)
{
printf("Hello!\n");
}
}
Please notice that I know about cons of strcpy etc. - it's just an example. My question is - is there any true reason for using temporary buffers to store arguments from argv? I assume there isn't any, but therefore I'm curious, why is it present in overflow examples, while in the reality it is never used? Maybe because of pure theory.
One possible real-world example: a program that renames *.foo to *.bar; you'll need both the original file name and a copy of it with the .foo part changed to .bar for the call to rename().
IIRC argv and its contents were not guaranteed to be writable and stable on all platforms, in the old times. C89 / C90 / ANSI-C standarized some of the existing practices. Similar for envp[]. Could also be that the routine of copying was inspired by the absence of memory protection on older platforms (such as MS-DOS). Normally (and nowadays) the OS and/or CRT takes care of copying the args form the caller's memory to the process's private memory arena.
Some programs prepend filenames with default paths:
void OpenLogFile (const char *fileName) {
char pathName[256];
sprintf(pathName, "/var/log/%s", fileName);
logFd = open(pathName, ...);
...
}
int main (int argc, char **argv) {
...
OpenLogFile(argv[i]);
...
}
If the entity that invokes the program passes in a name longer than 255-9 or so, sprintf overwrites past the end of pathName, and boom.
I am not answering this in terms of buffer overflow or security, but am answering strictly on why someone might want to make a copy of argv's contents.
If your program accepts a lot of arguments, like flags that would change execution path or processing mode, you might want to transfer argv's contents either directly to a log file, or store it temporarily in a buffer. If all decisions made on argv's contents occur in main and you still want to log argv's contents, you probably would not need to copy to a buffer.
If you depended on dispatched threads, processes, or even a subroutine making decisions based on the argv contents, you would probably want the argv values placed in a buffer, so you could pass them around.
Edit:
If you are worried about passing around a pointer, copy argv's contents to a fixed size buffer.