feeding data to C API expecting a filename - c

I'm writing a straightforward C program on Linux and wish to use an existing library's API which expects data from a file. I must feed it a file name as a const char*. But i have data, just like content of a file, already sitting in a buffer allocated on the heap. There is plenty of RAM and we want high performance. Wanting to avoid writing a temporary file to disk, what is a good way to feed the data to this API in a way that looks like a file?
Here's a cheap pretend version of my code:
marvelouslibrary.h:
int marvelousfunction(const char *filename);
normal-persons-usage.cpp, for which library was originally designed:
#include "marvelouslibrary.h"
int somefunction(char *somefilename)
{
return marvelousfunction(somefilename);
}
myprogram.cpp:
#include "marvelouslibrary.h"
int one_of_my_routines()
{
byte* stuff = new byte[1000000];
// fill stuff[] with...stuff!
// stuff[] holds same bytes as might be found in a file
/* magic goes here: make filename referring to stuff[] */
return marvelousfunction( ??? );
}
To be clear, the marvelouslibrary does not offer any API functions that accept data by pointer; it can only read a file.
I thought of pipes and mkfifo(), but seems meant for communicating between processes. I am no expert at these things. Does a named pipe work okay read and written in the same process? Is this a wise approach?
Or skip being clever, go with plan "B" which is to shuddup and just write a temp file. However, i'd like to learn something new and find out what's possible in this situation, beside getting high performance.

Given that you likely have a function like:
char *read_data(const char *fileName)
I think you will need to "skip being clever, go with plan "B" which is to shuddup and just write a temp file."
If you can dig around and find out if the call you are making is calling another function that takes a File * or an int for the file descriptor then you can do something better.
One thought that does come to mind, can you cahnge your code to write to a memory mapped file instead of to the heap? That way you would have a file on disk already and you would avoid the copying (though it'll still be on disk) and you can still give the function call the file name.

I'm not sure what kind of input the library function wants ... does it need a path/file name, or open file pointer, or open file descriptor?
If you don't want to hack the library and the function wants a string (path to a file), try making the temporary file in /dev/shm.
Otherwise, mmap might be the best option, please be sure to research posix_madvise() when using mmap() (or its counterpart posix_fadvise() if using a temporary file).
It looks like your talking about very little data to begin with, so I don't think you'll see a performance impact in whatever route you take.
Edit
Sorry, I just re-read your question .. perhaps I just read too fast. There is no way you are going to feed a function like:
char * foo(const char *filepath)
... with mmap().
If you can not modify the library to accept a file descriptor instead (or as an alternate to the path) .. just use /dev/shm and a temporary file, it will be quite cheap.

You're on Linux, can't you just grab the source of the library and hack in the function you need? If it's useful to others, you could even send a patch to the original author, so it will be in future versions for everyone.

Edit: Sorry. Just read the question. With my advise below, you fork a spare process, and the question of "does in work in a single process does not come up". I also see no reason you couldn't spawn a separate thread to do the push...
Not in the least elegant, but you could:
open a named pipe.
fork a streamer that does nothing but try to write to the pipe
pass the name of the pipe
which should be pretty robust...

mmap(), perhaps?

Related

Saving data into the program?

Is it possible for one portable program to keep save data inside the application?
I don't want the program to create folders or files.
In order to do it in a portable way, you should have no assumptions about the architecture or operating system: you may or may not have access to the executable in the first place (it could be argv[0], but maybe it isn't. If you had access to the executable file, you could have the rights to open the file and modify it, but maybe you cannot do it.
If, anyway, you want to try it, you could:
Check if argv[0] is a file, that you have read and write permission, and if it is really your code (looking for a random string you can leave somewhere in your code, for example).
Choose a string to mark your modifications, for example, "Edenia", and check if the last bytes of that file are those. If so, the file has been previously modified, and you can read your data process it.
When you want to store additional data, add it to the end of the file (if it was not modified yet), or substitute the modifications it had. Don't forget to add the mark at the end of the file ("Edenia", or whatever).
Anyway, I still think this is not the proper way to store data: try to use external storage (files, database, etc) if you can.

What are some best practices for file I/O in C?

I'm writing a fairly basic program for personal use but I really want to make sure I use good practices, especially if I decide to make it more robust later on or something.
For all intents and purposes, the program accepts some input files as arguments, opens them using fopen() read from the files, does stuff with that information, and then saves the output as a few different files in a subfolder. eg, if the program is in ~/program then the output files are saved in ~/program/csv/
I just output directly to the files, for example output = fopen("./csv/output.csv", "w");, print to it with fprintf(output,"%f,%f", data1, data2); in a loop, and then close with fclose(output); and I just feel like that is bad practice.
Should I be saving it in a temp directory wile it's being written to and then moving it when it's finished? Should I be using more advanced file i/o libraries? Am I just completely overthinking this?
Best practices in my eyes:
Check every call to fopen, printf, puts, fprintf, fclose etc. for errors
use getchar if you must, fread if you can
use putchar if you must, fwrite if you can
avoid arbitrary limits on input line length (might require malloc/realloc)
know when you need to open output files in binary mode
use Standard C, forget conio.h :-)
newlines belong at the end of a line, not at the beginning of some text, i.e. it is printf ("hello, world\n"); and not "\nHello, world" like those mislead by the Mighty William H. often write to cope with the sillyness of their command shell. Outputting newlines first breaks line buffered I/O.
if you need more than 7bit ASCII, chose Unicode (the most common encoding is UTF-8 which is ASCII compatible). It's the last encoding you'll ever need to learn. Stay away from codepages and ISO-8859-*.
Am I just completely overthinking this?
You are. If the task's simple, don't make a complicated solution on purpose just because it feels "more professional". While you're a beginner, focus on code readability, it will facilitate your and others' lives.
It's fine. I/O is fully buffered by default with stdio file functions, so you won't be writing to the file with every single call of fprintf. In fact, in many cases, nothing will be written to it until you call fclose.
It's good practice to check the return of fopen, to close your files when finished, etc. Let the OS and the compiler do their job in making the rest efficient, for simple programs like this.
If no other program is checking for the presence of ~/program/csv/output.csv for further processing, then what you're doing is just fine.
Otherwise you can consider writing to a FILE * obtained by a call to tmpfile in stdio.h or some similar library call, and when finished copy the file to the final destination. You could also put down a lock file output.csv.lck and remove that when you're done, but that depends on you being able to modify the other program's behaviour.
You can make your own cat, cp, mv programs for practice.

What is the best way to truncate the beginning of a file in C?

There are many similar questions, but nothing that answers this specifically after googling around quite a bit. Here goes:
Say we have a file (could be binary, and much bigger too):
abcdefghijklmnopqrztuvwxyz
what is the best way in C to "move" a right most portion of this file to the left, truncating the beginning of the file.. so, for example, "front truncating" 7 bytes would change the file on disk to be:
hijklmnopqrztuvwxyz
I must avoid temporary files, and would prefer not to use a large buffer to read the whole file into memory. One possible method I thought of is to use fopen with "rb+" flag, and constantly fseek back and forth reading and writing to copy bytes starting from offset to the beginning, then setEndOfFile to truncate at the end. That seems to be a lot of seeking (possibly inefficient).
Another way would be to fopen the same file twice, and use fgetc and fputc with the respective file pointers. Is this even possible?
If there are other ways, I'd love to read all of them.
You could mmap the file into memory and then memmove the contents. You would have to truncate the file separately.
You don't have to use an enormous buffer size, and the kernel is going to be doing the hard work for you, but yes, reading a buffer full from up the file and writing nearer the beginning is the way to do it if you can't afford to do the simpler job of create a new file, copy what you want into that file, and then copy the new (temporary) file over the old one. I wouldn't rule out the possibility that the approach of copying what you want to a new file and then either moving the new file in place of the old or copying the new over the old will be faster than the shuffling process you describe. If the number of bytes to be removed was a disk block size, rather than 7 bytes, the situation might be different, but probably not. The only disadvantage is that the copying approach requires more intermediate disk space.
Your outline approach will require the use of truncate() or ftruncate() to shorten the file to the proper length, assuming you are on a POSIX system. If you don't have truncate(), then you will need to do the copying.
Note that opening the file twice will work OK if you are careful not to clobber the file when opening for writing - using "r+b" mode with fopen(), or avoiding O_TRUNC with open().
If you are using Linux, since Kernel 3.15 you can use
#include <fcntl.h>
int fallocate(int fd, int mode, off_t offset, off_t len);
with the FALLOC_FL_COLLAPSE_RANGE flag.
http://manpages.ubuntu.com/manpages/disco/en/man2/fallocate.2.html
Note that not all file systems support it but most modern ones such as ext4 and xfs do.

How would I go about checking the file system I'm working on in C

I'm making a program and one of the things it needs to do is transfer files. I would like to be able to check before I start moving files if the File system supports files of size X. What is the best way of going about this?
Go on with using a function like ftruncate to create a file of the desired size in advance, before the moving, and do the appropriate error-handling in case it fails.
There's no C standard generic API for this. You could simply try creating a file and writing junk to it until it is the size you want, then deleting it, but even that isn't guaranteed to give you the info you need - for instance another process might have come and written a large file in between your test and your transfer, taking up space you were hoping to use.

Avoiding the use of temporary files, when the function wants a FILE * passed

I currently use C++ to do some graph related computation using boost::graph.
boost::graph can output its graph as a dot file and I use a std::stringstream to capture the output dot file. Thus the contents of the dot file resides in memory.
I want to use the dot file to visualize the graph (as fast as possible). Thus I want to generate the dot file, generate an svg file and print it onto some canvas. I want to avoid using temporary files for this, as the graphs shall be small and memory is available anyway.
However graphviz libgraph has only the function extern Agraph_t *agread(FILE *); the only way I can imagine to make this working is to hack around in the filehandle struct __FILE which is really not portable.
How would you let a library read you memory contents as a file in Unix/linux?
I just found out that libcgraph from GraphViz allows to enter a overloaded version here, but so far the documentation doesn't point me to some usefull place.
Well, it is arguably a bug in the API, but here's an idea. This is assuming that the agread() function would read the file in as binary data.
Note that I am not familiar with the API you're using, but I hope this may be useful anyway.
Map a file into memory using mmap().
Use that memory region to do your graph construction.
When it comes time to call agread(), open that file descriptor into a FILE * struct (fopen() or fdopen() if you didn't close the descriptor).
Pass the FILE * struct.
Edit: Or, ignore my answer and use the fmemopen() call. It probably is exactly what you need. I didn't want to delete my answer though, in case someone is currently writing a response :-).
You could create a pipe with pipe(), write the data into the input end and use fdopen() to turn the output file descriptor into a filehandle suitable for passing into agread().
However, this will only work if you're sure that the data is less than PIPE_BUF bytes; otherwise the write might block forever, since there's nothing reading from the other end.
In general, using temporary files is much easier and more reliable. Just use tmpfile() to get a file handle, write the data into it, rewind and pass it to agread():
fh = tmpfile();
fputs( data, fh );
rewind( fh );
graph = agread( fh );
fclose( fh );
(Of course, you should check for errors, which I didn't for the sake of brevity.)
If you are willing to use a GNU libc extension, you can open a C string as a FILE*; the documentation is at http://www.gnu.org/s/libc/manual/html_node/String-Streams.html.
On Windows, you can open a named pipe with fopen.
FILE* f = fopen("\\\\.\\Pipe\\<pipe name>", "rb");
So you can create a pipe in a separate thread where you push the data on it, and agread will read from it without a need for a temporary file.

Resources