Load file content into memory, C - c

I will be dealing with really huge files, for which I want just partially to load the content into memory. So I was wondering if the command:
FILE* file=fopen("my/link/file.txt", "r");
loads the whole file content into memory or it is just a pointer to the content? After I open the file I use fgets() to read the file line by line.
And what about fwrite()? Do I need to open and close the file every time I write something so It doesn't overloads or it is managed in the background?
Another thing, is there maybe a nice bash command like "-time" which could tell me the maximal peak memory of my executed program ? I am using OSx.

As per the man page for fopen(),
The fopen() function opens the file whose name is the string pointed to by path and associates a stream with it.
So, no, it does not load the content of the file into memory or elsewhere.
To operate on the returned file pointer, as you already know, you need to use fgets() and family.
Also, once you open the file, get a pointer and does not fclose() the same, you can use the pointer any number of time to write into the file (remember to open the file in append more). You don't need to open and close for every read and write made to the pointer.
Also, FWIW, if you want to move the file pointer back and forth, you may feel fseek() can come handy.

fopen does not load all the file into the memory. It create a file descriptor to the file. Like a pointer to the place of the open file table.
in the open file table you have a pointer to the location of the file on the disk.
if you want to go to place on the file use fseek.
another Option is to use mmap. This is create new mapping in the virtual address space of the calling process. You can access to the file like an array.. (not all the file load into the memory. it use the memory pages mechanism to load the data)

fopen does not read the file, fread and fgets and similar functions do.
Personally I've never tried reading and writing a file at the same time.
It should work, though.
You can use multiple file pointers to the same file.
There is no command like time for memory consumption. The simplest way is to look at top. There exist malloc/new replacement libraries which can do that for you.

loads the whole file content into memory or it is just a pointer to the content?
No,
fopen() open file with the specified filename and associates it with a stream that can be identified by the FILE pointer.
fread() can be used to get file contents into buffer.
Multiple read/write operations can be carried out without any need for opening files number of times.
Functions like rewind() and fseek() can be used to change position of cursor in file.

Related

confused about using ftell() to check if the file is empty

I want to add a structure to a binary file but first i need to check whether the file has previous data stored in it, and if not i can add the structure,otherwise ill have to read all the stored data and stick the structure in its correct place, but i got confused about how to check if the file is empty ,i thought about trying something like this:
size = 0
if(fp!=NULL)
{
fseek (fp, 0, SEEK_END);
size = ftell (fp);
rewind(fp);
}
if (size==0)
{
// print your error message here
}
but if the file is empty or still not created how can the file pointer not be NULL ? whats the point of using ftell() if i can simply do something like this :
if(fp==NULL){fp=fopen("data.bin","wb");
fwrite(&struct,sizeof(struct),1,ptf);
fclose(fp);}
i know that NULL can be returned in other cases such as protected files but still i cant understand how using ftell() is effective when file pointers will always return NULL if the file is empty.any help will be appreciated :)
i need to check whether the file has previous data stored in it
There might be no portable and robust way to do that (that file might change during the check, because other processes are using it). For example, on Unix or Linux, that file might be opened by another process writing into it while your own program is running (and that might even happen between your ftell and your rewind). And your program might be running in several processes.
You could use operating system specific functions. For POSIX (including Linux and many Unixes like MacOSX or Android), you might use stat(2) to query the file status (including its size with st_size). But after that, some other process might still write data into that file.
You might consider advisory locking, e.g. with flock(2), but then you adopt the system-wide convention that every program using that file would lock it.
You could use some database with ACID properties. Look into sqlite or into RDBMS systems like PostGreSQL or MariaDB. Or indexed file library like gdbm.
You can continue coding with the implicit assumption (but be aware of it) that only your program is using that file, and that your program has at most one process running it.
if the file is empty [...] how can the file pointer not be NULL ?
As Increasingly Idiotic answered, fopen can fail, but usually don't fail on empty files. Of course, you need to handle fopen failure (see also this). So most of the time, your fp would be valid, and your code chunk (assuming no other process is changing that file simulateously) using ftell and rewind is an approximate way to check that the file is empty. BTW, if you read (e.g. with fread or fgetc) something from that file, that read would fail if your file was empty, so you probably don't need to check its emptiness before.
A POSIX specific way to query the status (including size) of some fopen-ed file is to use fileno(3) and fstat(2) together like fstat(fileno(fp), &mystat) after having declared struct stat mystat;
fopen() does not return NULL for empty files.
From the documentation:
If successful, returns a pointer to the object that controls the opened file stream ... On error, returns a null pointer.
NULL is returned only when the file could not be opened. The file could fail to open due to any number of reasons such as:
The file doesn't exist
You don't have permissions to read the file
The file cannot be opened multiple times simultaneously.
More possible reasons in this SO answer
In your case, if fp == NULL you'll need to figure out why fopen failed and handle each case accordingly. In most cases, an empty file will open just fine and return a non NULL file pointer.

File Path to in memory file

I have a void *buffer that is an instance of a file in RAM.
The file type is in a format that must be parsed by the API given.
Unfortunately, the only way to open this file type through the API is to supply the API with the file path:
sample_api_open(char *file_name, ...);
I understand that shm_open returns the file descriptor, but the API only takes a file path.
Is there a work around to read this type of file in memory?
Is there a work around to read this type of file in memory?
Dump the content of the buffer into a temporary file on /tmp.
On the vast majority of modern *NIX systems, the /tmp is a synthetic, in-memory file system. Only in case of the memory shortage, the content might hit the disk due to the swapping.
If the amount of the information is too large, to avoid duplication, after dumping the content onto /tmp, you can free the local memory and mmap() the content of the file.
Instead of using POSIX shared memory, you could open a temporary file and mmap() it. Then make the buffer end up in the mmap()-ed region so you can finally call the API on the temporary file.

How to get memory address from file path without opening file

I have a file. I know its file path, and I want to know its memory address. My problem is that I can't actually open the file (fopen can't be used because the OS thinks the file is in use) to get its memory address.
Ex.
fopen("C:/example/file", "r") returns null
From why I understand the OS returns the memory address after it confirms the file isn't in use. So is it even possible to bystep the OS?
#Alter by finding the Process ID of the process that has locks on the file, you could get somewhere... You might be able to track your files contents in memory as part of the memory space allocated to the Process.
However, just because a file is locked does not at all mean that the file is in memory. Sometimes just a part of a file is used, like the functions within a DLL - where only the 'used' and necessary chunks of the file would be in memory. Other times, the entire document (file) will be present very nicely and contiguously in memory (consider a text file open in Notepad) . It is also possible that the file is locked purely like a placeholder, where the lock is all that matters and none of the file is actually loaded. You really need to know alot about the Process that has locks on the file.
Now if you simply want to copy the file to another file, then launch the copy before the 'Process' locks the file. You could try a batch file that runs at Windows Startup - and see if that is early enough to copy the file before a lock is placed on it.

int filedes? system calls read and write

Can any of you guys tell me what "int filedes" refers to?
http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html
I've noticed I can put any int in there and it seems to work but I don't know what it's for...
Thanks.
It's a file descriptor. See http://en.wikipedia.org/wiki/File_descriptor. Since it represents an offset to a table lookup of files and pipes, there may be multiple descriptors that could return valid data. 0=stdin and 2=stderr will exist by default, or you can look at the open function to create your own.
The very first sentence of the description says, "the file associated with the open file descriptor, fildes". In other words, it indicates the file you're reading from. If your read function call works no matter what file descriptor you pass it, your program isn't doing what you think it is.
Somewhere inside the kernel, there is a table comprises of file descriptor entries on a per-process base. A file descriptor is a structure which describes the state of the file. What kind of information has a file descriptor? First of all, position from which the next read/write operation can be performed. Then, the access mode of the file, specified by the open system call. And last but not least, a data structure which represent the on-disk information of a file. In *nix, this is an inode structure. Here, the main question to be answered is : Where resides the blocks of the file in the disk. If you have an inode of a file in the memory, you can find quickly, where is the Nth block of the file(which means you don't need to parse the path every time, and scan each directory in the path to resolve the inode).

Does "opening a file" mean loading it completely into memory?

There's an AudioFileOpenURL function which opens an file. With AudioFileReadPackets that file is accessed to read packets. But one thing that stucks in my brain is: Does AudioFileOpenURL actually load the whole monster into memory? Or is that a lightweight operation?
So is it possible to read data from a file, only a specific portion, without having the whole terabytes of stuff in memory?
Does AudioFileOpenURL actually load the whole monster into memory?
No, it just gets you a file pointer.
Or is that a lightweight operation?
Yep, fairly lightweight. Just requires a filesystem lookup.
So is it possible to read data from a file, only a specific portion, without having the whole terabytes of stuff in memory?
Yes, you can use fseek to go a certain point in the file, then fread to read it into a buffer (or AudioFileReadBytes).
No, it doesn't load the entire file into memory. "Opens a file" returns a handle to you allowing you to read from or write to a file.
I don't know about objective-c, but with most languages you open the file, and that just gives you the ability to THEN access the contents with a READ operation. In your case, you can perform a SEEK to move the file pointer to the desired location, then read the number of bytes you need.
AudioFileOpenURL will open(2) the file and read the necessary info (4096 bytes) to determine the audio type.
open(2) won't load the whole file into RAM.
(AudioFileOpenURL is a C API, not Objective-C.)

Resources