Does "opening a file" mean loading it completely into memory? - c

There's an AudioFileOpenURL function which opens an file. With AudioFileReadPackets that file is accessed to read packets. But one thing that stucks in my brain is: Does AudioFileOpenURL actually load the whole monster into memory? Or is that a lightweight operation?
So is it possible to read data from a file, only a specific portion, without having the whole terabytes of stuff in memory?

Does AudioFileOpenURL actually load the whole monster into memory?
No, it just gets you a file pointer.
Or is that a lightweight operation?
Yep, fairly lightweight. Just requires a filesystem lookup.
So is it possible to read data from a file, only a specific portion, without having the whole terabytes of stuff in memory?
Yes, you can use fseek to go a certain point in the file, then fread to read it into a buffer (or AudioFileReadBytes).

No, it doesn't load the entire file into memory. "Opens a file" returns a handle to you allowing you to read from or write to a file.

I don't know about objective-c, but with most languages you open the file, and that just gives you the ability to THEN access the contents with a READ operation. In your case, you can perform a SEEK to move the file pointer to the desired location, then read the number of bytes you need.

AudioFileOpenURL will open(2) the file and read the necessary info (4096 bytes) to determine the audio type.
open(2) won't load the whole file into RAM.
(AudioFileOpenURL is a C API, not Objective-C.)

Related

How does one write files to disk, sequentially, in C?

I want to write a program that writes data as one contiguous block of data to disk, so that when I read that data back from the disk, I can just read one long series of bytes without stopping. Are there any references I can be directed to regarding this issue?
I am essentially asking whether or not it is possible to write data for multiple files contiguously and read past an EOF, or many, to retrieve the data written.
I am aware of fwrite and fopen, I just want to be sure that the data being written to disk is contiguous.
It depends on what the underlying filesystem is, as this is filesystem-dependent. You'll want to look at extents, which are a contiguous area of storage reserved for a file.
On Windows you can open an unformatted volume with CreateFile and then WriteFile a contiguous block of data. It won't be a file, but you will be able to read it back as you stated.
According to this NTFS tries to allocate contiguous space if possible, your chances are lower when appending though.

Load file content into memory, C

I will be dealing with really huge files, for which I want just partially to load the content into memory. So I was wondering if the command:
FILE* file=fopen("my/link/file.txt", "r");
loads the whole file content into memory or it is just a pointer to the content? After I open the file I use fgets() to read the file line by line.
And what about fwrite()? Do I need to open and close the file every time I write something so It doesn't overloads or it is managed in the background?
Another thing, is there maybe a nice bash command like "-time" which could tell me the maximal peak memory of my executed program ? I am using OSx.
As per the man page for fopen(),
The fopen() function opens the file whose name is the string pointed to by path and associates a stream with it.
So, no, it does not load the content of the file into memory or elsewhere.
To operate on the returned file pointer, as you already know, you need to use fgets() and family.
Also, once you open the file, get a pointer and does not fclose() the same, you can use the pointer any number of time to write into the file (remember to open the file in append more). You don't need to open and close for every read and write made to the pointer.
Also, FWIW, if you want to move the file pointer back and forth, you may feel fseek() can come handy.
fopen does not load all the file into the memory. It create a file descriptor to the file. Like a pointer to the place of the open file table.
in the open file table you have a pointer to the location of the file on the disk.
if you want to go to place on the file use fseek.
another Option is to use mmap. This is create new mapping in the virtual address space of the calling process. You can access to the file like an array.. (not all the file load into the memory. it use the memory pages mechanism to load the data)
fopen does not read the file, fread and fgets and similar functions do.
Personally I've never tried reading and writing a file at the same time.
It should work, though.
You can use multiple file pointers to the same file.
There is no command like time for memory consumption. The simplest way is to look at top. There exist malloc/new replacement libraries which can do that for you.
loads the whole file content into memory or it is just a pointer to the content?
No,
fopen() open file with the specified filename and associates it with a stream that can be identified by the FILE pointer.
fread() can be used to get file contents into buffer.
Multiple read/write operations can be carried out without any need for opening files number of times.
Functions like rewind() and fseek() can be used to change position of cursor in file.

Windows C: the inverse of CreateMapFile

I have a rather odd program where I need to load a file into memory, close that file handle, and then use the file image in memory like a file (where i use ReadFile, WriteFile with a HANDLE)... so basically I'm looking at doing the inverse of CreateMapFile... is this possible within the Windows API?
Thanks!
When you call CreateFile, you can use FILE_ATTRIBUTE_TEMPORARY. This attempts to hold the data for the file in RAM if possible, but it does not guarantee it -- the data could be written out to disk if memory gets low enough.

How to copy a ram_base file to disk efficiently

I want to copy a large a ram-based file (located at /dev/shm direcotry) to local disk, is there some way for an efficient copy instead of read char one by one or create another piece memory? I can use only C language here. Is there anyway that I can put the memory file directly to disk? Thanks!
I would mmap() the files and do memcpy() between them.
Thanks you guys for the help! I made it by mmap the ram-based file and write the entire block directly to the destination. memcopy was not used because I am actually writing to a parallel file system (pvfs), which does not support mmap operation.
/dev/shm is shared memory, so one way to copy it would be to open it as shared memory, but frankly I don't think you will gain anything.
when writing your memory file to disk, the bottleneck will be the disk.
just be sure to write data in big chunks, and you should be fine.
You can just copy it like any other file:
cp /dev/shm/tmp ~/tmp
So, a quick, simple way is to issue a cp command via system().
You could try to see if the splice system call works for this. I'm not sure if it will since it has some restrictions about the types of files that it can work with, but if it did work you would call it repeatedly with memory page sized (or some multiple page size) requests repeatedly until it finished, and the kernel would handle it very efficently.
If this doesn't work you'll need to do either mmap or do plain old read/write.
Reading and Writing in memory page sized chunks makes things much more efficient. It can be even more efficient if your buffers are memory page size aligned since it opens up the oppurtunity for the kernel to just move the data to/from your process's memory via memory managment trickery rather than actually copying the data around.
The only thing you can do is read() in page size aligned chunks. I'm assuming you need to guarantee the data as written, which is going to mean bypassing buffers via posix_fadvise() or using O_DIRECT (I typically use posix_fadvise(), but O_DIRECT is appropriate here).
In this case, the speed of the media being written to alone dictates how quickly this will happen.
If you don't need to bypass buffers, the operation will complete faster, but there's no guarantee that the data will actually be written in the event of a reboot / power outage / etc. Since the source of the data is in shared memory, I'm (again) guessing you want the write to be guaranteed.
The only thing you can optimize is how long it takes read() to get data from shared memory into your own address space, which page size aligned chunks will improve.

How exactly is a file opened for reading/writing by applications like msword/pdf?

I want are the steps that an application takes inorder to open the file and allow user to read. File is nothing more than sequence of bits on the disk. What steps does it take to show show the contents of the file?
I want to programatically do this in C. I don't want to begin with complex formats like word/pdf but something simpler. So, which format is best?
If you want to investigate this, start with plain ASCII text. It's just one byte per character, very straightforward, and you can open it in Notepad or any one of its much more capable replacements.
As for what actually happens when a program reads a file... basically it involves making a system call to open the file, which gives you a file handle (just a number that the operating system maps to a record in the filesystem). You then make a system call to read some data from the file, and the OS fetches it from the disk and copies it into some region of RAM that you specify (that would be a character/byte array in your program). Repeat reading as necessary. And when you're done, you issue yet another system call to close the file, which simply tells the OS that you're done with it. So the sequence, in C-like pseudocode, is
int f = fopen(...);
while (...) {
byte foo[BLOCK_SIZE];
fread(f, foo, BLOCK_SIZE);
do something with foo
}
fclose(f);
If you're interested in what the OS actually does behind the scenes to get data from the disk to RAM, well... that's a whole other can of worms ;-)
Start with plain text files

Resources