Writing data to a restricted size text file in C

Writing data to a restricted size text file in C - c

I need to write some system log data (usually not more than 100 characters at a time) into a log file based on particular events. But the size of this log file is small (say around 4KB), and I need to wrap around the logs when the file size hits the limit. While wrapping around, I need to preserve the latest info, and later on present it in chronological order as it was written to the file. What is the best way to do this? I want to avoid making copies of the file to do this.

To write to a restricted file:
call ftell to find out where you are in the file
call fwrite to write as much as you can, with respect to restricted size
if you couldn't write the whole message
call fseek to return to the start of the file
call fwrite to write the remainder of the message
To meet your modified requirements, you will need to use a record-based file. Choose arecord size slightly bigger than the largest message and give each message a timestamp. The algorithm I described still works, except that you go back to the start if you can't write the whole message. You will also need to write a small application to read the file and present the contents in chronological order.
Alternatively, investigate using an existing logging library like log4c.

Related

Removing bytes from File in (C) without creating new File

I have a file let's log. I need to remove some bytes let's n bytes from starting of file only. Issue is, this file referenced by some other file pointers in other programs and may these pointer write to this file log any time. I can't re-create new file otherwise file-pointer would malfunction(i am not sure about it too).
I tried to google it but all suggestion for only to re-write to new files.
Is there any solution for it?

I can suggest two options:
Ring bufferUse a memory mapped file as your logging medium, and use it as a ring buffer. You will need to manually manage where the last written byte is, and wrap around your ring appropriately as you step over the end of the ring. This way, your logging file stays a constant size, but you can't tail it like a regular file. Instead, you will need to write a special program that knows how to walk the ring buffer when you want to display the log.
Multiple number of small log filesUse some number of smaller log files that you log to, and remove the oldest file as the collection of files grow beyond the size of logs you want to maintain. If the most recent log file is always named the same, you can use the standard tail -F utility to follow the log contents perpetually. To avoid issues of multiple programs manipulating the same file, your logging code can send logs as messages to a single logging daemon.

So... you want to change the file, but you cannot. The reason you cannot is that other programs are using the file. In general terms, you appear to need to:
stop all the other programs messing with the file while you change it -- to chop now unwanted stuff off the front;
inform the other programs that you have changed it -- so they can re-establish their file-pointers.
I guess there must be a mechanism to allow the other programs to change the file without tripping over each other... so perhaps you can extend that ? [If all the other programs are children of the main program, then if the children all O_APPEND, you have a fighting chance of doing this, perhaps with the help of a file-lock or a semaphore (which may already exist ?). But if the programs are this intimately related, then #jxh has other, probably better, suggestions.]
But, if you cannot change the other programs in any way, you appear to be stuck, except...
...perhaps you could try 'sparse' files ? On (recent-ish) Linux (at least) you can fallocate() with FALLOC_FL_PUNCH_HOLE, to remove the stuff you don't want without affecting the other programs file-pointers. Of course, sooner or later the other programs may overflow the file-pointer, but that may be a more theoretical than practical issue.

File being simultaneously written, read and removed

I'm wondering if there is an mechanism that reads a file while it is being written and remove the content that has been read simultaneously. The purpose for doing this is because the file is stored in memory (ramdisk) and as the file size increases, we need to remove the part that has already being processed.
Thanks a lot!!!
PS: I'm using Linux and Java for this. :)

Data cannot be removed from the beginning or middle of a file. Process the data using multiple files and erase them as they are consumed.

Reading from a file while it is being written to is no big deal, this is the purpose of every tail program, however deleting already read content of an opened file... I don't think it is possible.
You may want to think of a work around. For example you can have a number of files {0,n} with the same limit of bytes to write to. Start writing the file_i where i is the highest available number out of {0,n} and go up to limit. Reading starts from the lowest available file_i, reads up to limit and when done deletes the file just consumed.

We still haven't heard what OS our friend user2386567 is using, but as a counterpoint to the other answers declaring that it's impossible to delete data from the middle of a file, I'd like to point out that Linux has FALLOC_FL_PUNCH_HOLE for that exact purpose.

Alter a file (without fseek or + modes) or concatenate two files with minimal copying

I am writing an audio file to an SD/MMC storage card in real time, in WAVE format, working on an ARM board. Said card is (and must remain) in FAT32 format. I can write a valid WAVE file just fine, provided I know how much I'm going to write beforehand.
I want to be able to put placeholder data in the Chunk Data Size field of the RIFF and data chunks, write my audio data, and then go back and update the Chunk Data Size field in those two chunks so that they have correct values, but...
I have a working filesystem and some stdio functions, with some caveats:
fwrite() supports r, w, and a, but not any + modes.
fseek() does not work in write mode.
I did not write the implementations of the above functions (I am using ARM's RL-FLashFS), and I am not certain what the justification for the restrictions/partial implementations is. Adding in the missing functionality personally is probably an option, but I would like to avoid it if possible (I have no other need of those features, do not forsee any, and can't really afford to spend too much time on it.) Switching to a different implementation is also not an option here.
I have very limited memory available, and I do not know how much audio data will be received, except that it will almost certainly be more than I can keep in memory at any one time.
I could write a file with the raw interleaved audio data in it while keeping track of how many bytes I write, close it, then open it again for reading, open a second file for writing, write the header into the second file, and copy the audio data over. That is, I could post-process it into a properly formatted valid WAVE file. I have done this and it works fine. But I want to avoid post-processing large amounts of data if at all possible.
Perhaps I could somehow concatenate two files in place? (I.e. write the data, then write the chunks to a separate file, then join them in the filesystem, avoiding much of the time spent copying potentially vast amounts of data.) My understanding of that is that, if possible, it would still involve some copying due to block orientation of the storage.
Suggestions?
EDIT:
I really should have mentioned this, but there is no OS running here. I have some stdio functions running on top of a hardware abstraction layer, and that's about it.

This should be possible, but it involves writing a set of FAT table manipulation routines.
The concept of FAT is simple: A file is stored in a chain of "clusters" - fixed size blocks. The clusters do not have to be contiguous on the disk. The Directory entry for a file includes the ID of the first cluster. The FAT contains one value for each cluster, which is either the ID of the next cluster in the chain, or an "End-Of-Chain" (EOC) marker.
So you can concatenate files together by altering the first file's EOC marker to point to the head cluster of the second file.
For your application you could write all the data, rewrite the first cluster (with the correct header) into a new file, then do FAT surgery to graft the new head onto the old tail:
Determine the FAT cluster size (S)
Determine the size of the WAV header up to the first data byte (F)
Write the incoming data to a temp file. Close when stream ends.
Create a new file with the desired name.
Open the temp file for reading, and copy the header to the new file while filling in the size field(s) correctly (as you have done before).
Write min(S-F, bytes_remaining) to the new file.
Close the new file.
If there are no bytes remaining, you are done,
else,
Read the FAT and Directory into memory.
Read the Directory to get
the first cluster of the temp file (T1) (with all the data),
the first cluster of the wav file (W1). (with the correct header)
Read the FAT entry for T1 to find the second temp cluster (T2).
Change the FAT entry for W1 from "EOC" to T2.
Change the FAT entry for T1 from T2 to "EOC".
Swap the FileSize entries for the two files in the Directory.
Write the FAT and Directory back to disk.
Delete the Temp file.
Of course, by the time you do this, you will probably understand the file system well enough to implement fseek(fp,0,SEEK_SET), which should give you enough functionality to do the header fixup through standard library calls.

We are working with exactly the same scenario as you in our project - recorder application. Since the length of file is unknown - we write a RIFF header with 0 length in the beginning (to reserve space) and on closing - go back to the 0 position (with fseek) and write correct header. Thus, I think you have to debug why fseek doesn't work in write mode, otherwise you will not be able to perform this task efficiently.
By the way, you'd better off from file system internal specific workarounds like concatenating blocks, etc - this is hardly possible, will not be portable and can bring you new problems. Let's use standard and proven methods instead.
Update
(After finding out that your FS is ARM's RL-FlashFS) why not using rewind http://www.keil.com/support/man/docs/rlarm/rlarm_rewind.htm instead of fseek?

Reading and piping large files with C

I am interested in writing a utility that modifies PostScript files. It needs to traverse the file, make certain decisions about the page count and dimensions, and then write the output to a file or stdout making certain modifications to the PostScript code.
What would be a good way to handle file processing on a *NIX system in this case? I'm fairly new to pipes and forking in C, and it is my understanding that, in case of reading a file directly, I could probably seek back and forth around the input file, but if input is directly piped into the program, I can't simply rewind to the beginning of an input as the input could be a network stream for example, correct?
Rather than store the entire PS file into memory, which can grow huge, it seems like it would make more sense to buffer the input to disk while doing my first pass of page analysis, then re-read from the temporary file, produce output, and remove the temporary file. If that's a viable solution, where would be a good place to store such a file on a *NIX system? I'm not sure how safe such code would be either: the program could potentially be used by multiple users on the same server. It sounds like I would have make sure to save the file somewhere in a temporary directory unique to a given user account as well as give the temporary file on disk a fairly unique name.
Would appreciate any tips and pointers on this crazy puzzling world of file processing.

Use mkstemp(3) to create your temporary file. It will handle concurrency issues for you. mmap(2) will let you move around in the file with abandon.

if input is directly piped into the program, I can't simply rewind to the beginning of an input as the input could be a network stream for example, correct?
That's correct. You can only perform random access on a file.
If you read the file, perhaps you could build a table of metadata, which you can use to seek specific portions of the file later, without keeping the file itself in memory.

/tmp is the temporary directory on unix systems. It's specified by FHS. It's cleaned out when the system is rebooted.
If you need more persistent data storage than that there's /var/tmp which is not cleaned out after reboots. Also FHS.
http://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard

How exactly is a file opened for reading/writing by applications like msword/pdf?

I want are the steps that an application takes inorder to open the file and allow user to read. File is nothing more than sequence of bits on the disk. What steps does it take to show show the contents of the file?
I want to programatically do this in C. I don't want to begin with complex formats like word/pdf but something simpler. So, which format is best?

If you want to investigate this, start with plain ASCII text. It's just one byte per character, very straightforward, and you can open it in Notepad or any one of its much more capable replacements.
As for what actually happens when a program reads a file... basically it involves making a system call to open the file, which gives you a file handle (just a number that the operating system maps to a record in the filesystem). You then make a system call to read some data from the file, and the OS fetches it from the disk and copies it into some region of RAM that you specify (that would be a character/byte array in your program). Repeat reading as necessary. And when you're done, you issue yet another system call to close the file, which simply tells the OS that you're done with it. So the sequence, in C-like pseudocode, is
int f = fopen(...);
while (...) {
byte foo[BLOCK_SIZE];
fread(f, foo, BLOCK_SIZE);
do something with foo
}
fclose(f);
If you're interested in what the OS actually does behind the scenes to get data from the disk to RAM, well... that's a whole other can of worms ;-)