Should I close file while writing/reading concurrently?

Should I close file while writing/reading concurrently? - file

I'm making a server that can read/write from file concurrently(from goroutines populated by net/http handlers). I've created The following struct:
type transactionsFile struct {
File *os.File
Mutex *sync.RWMutex
}
I'm initializing file once at the init() function. Should I close it somehow on each writing operation?

You can't write to a closed file, so if you close it after each write operation, you also have to (re)open it before each write.
This would be quite inefficient. So instead leave it open, and only close it once your app is about to terminate (this is required because File.Write() does not guarantee that when it returns the data is written to disk). Since you're writing the file from HTTP handlers, you should implement graceful server termination, and close the file after that. See Server.Shutdown() for details.
Also, if the purpose of your shared file writing is to create some kind of logger, you could take advantage of the log package, so you would not have to use a mutex. For details, see net/http set custom logger.

Related

How should I handle file descriptor 'dependencies' when using epoll?

I'm writing an HTTP/2 server in C, using epoll. Let's say a client asks for /index.html - I need to open a file descriptor pointing to that file and then send it back to the socket whenever I read a chunk of it. So I'd have an event loop that looks something like this:
while (true)
events = epoll_wait()
for event in events
if event is on a socket
handle socket i/o
else if event is on a disk file
read as much as possible, and send to associated socket
However this poses a problem. If the socket then closes (for whatever reason), the file descriptor for index.html will also get closed too. But it's possible that the index.html FD will have already been queued for reading (i.e, it's already in events, since you closed it between calls to epoll_wait), and as such when the for loop gets to processing the FD I'll now be accessing a 'dangling' FD.
If this was a single threaded program I'd try and hack around the issue by looking at file descriptor numbers, but unfortunately I'm running the same epoll loop on multiple threads which means that I can't predict what FD numbers will be in use at a given moment. It's totally possible that by the time the invalid read on the file comes around, another thread will have claimed that FD, so the call to read won't explicitly fail but I'll probably get a use after free anyway for trying to send it on a socket that doesn't exist anymore.
What's the best way of dealing with this issue? Maybe I should take an entirely different approach and not have file I/O on the same epoll loop at all.

Stdout redirecting (to a file for instance) with a static library in C

I know already how to implement methods regarding usual freopen(), popen() or similar stdout/stdin/stderr -based redirecting mechanisms, but I wondered how should I apply the said mechanism to static (own) libraries in C? Say, I want to use a library to capture any program with printf() commands or so into a file (for instance) without letting it appear on the console - are there some things I need to acknowledge before applying simple fd dups and just calling the library in the main program? Even piping seems to be complex seeing as execing here is risky...
thanks in advance.

There's an old-timers' trick to force the entire process, regardless of what library the code comes from, to have one of the standard IO ports connected to a different filehandle. You simply close the filehandle in question, then open a new one. If you close(1), then open('some_file', 'w'), then ALL calls that would result in a write to stdout will go to some_file from that point forward.
This works because open() always uses the first file descriptor that isn't currently in use. Presuming that you haven't closed stdin (fd=0), the call to open will get a file descriptor of 1.
There are some caveats. FILE outputs that haven't flushed their buffers will have undefined behavior, but you probably won't be doing this in the middle of execution. Set it up as your process starts and you'll be golden.

Preventing reuse of file descriptors

Is there anyway in Linux (or more generally in a POSIX OS) to guarantee that during the execution of a program, no file descriptors will be reused, even if a file is closed and another opened? My understanding is that this situation would usually lead to the file descriptor for the closed file being reassigned to the newly opened file.
I'm working on an I/O tracing project and it would make life simpler if I could assume that after an open()/fopen() call, all subsequent I/O to that file descriptor is to the same file.
I'll take either a compile-time or run-time solution.
If it is not possible, I could do my own accounting when I process the trace file (noting the location of all open and close calls), but I'd prefer to squash the problem during execution of the traced program.

Note that POSIX requires:
The open() function shall return a file descriptor for the named file
that is the lowest file descriptor not currently open for that
process.
So in the strictest sense, your request will change the program's environment to be no longer POSIX compliant.
That said, I think your best bet is to use the LD_PRELOAD trick to intercept calls to close and ignore them.

You'd have to write a SO that contains a close(2) that opens /dev/null on old FDs, and then use $LD_PRELOAD to load it into process space before starting the application.

You must already be ptraceing the application to intercept its file opening and closing operations.
It would appear trivial to prevent FD re-use by "injecting" dup2(X, Y); close(X); calls into the application, and adjusting Y to be anything you want.
However, the application itself could be using dup2 to force a re-use of previously closed FD, and may not work if you prevent that, so I think you'll just have to deal with this in post-processing step.
Also, it's quite easy to write an app that will run out of FDs if you disallow re-use.

Writing logs to a file in multithreaded application

I've written a server-client application. Now, I have to write what happens on the server to a log file. The server is written in C. I can already write what happens to the screen using printf.
So I'll just have to use fprintf instead of printf. My question is how should I handle the file?
I have Server.c source file where there is the main function
Here is the basic structure of my Server application:
Server.c
//.. some code
int main(...) {
//some code
//initialize variables
//bind server
//listen server on port
while(1)
{
//accept client
int check = pthread_create(&thread, NULL, handle_client,&ctx);//create new thread
//..
}//end while
return EXIT_SUCCESS;
}//end main
handle_client is a function which handles clients in a new thread.
How should I make the server log? I will have one text file for example SERVERLOG.log, but there are many clients on the server. How should I handle multiple access to this file?
One way is to create file when I start the server, open it, write in it, close it.
If a client wants to write in the file, then it should open the file to write in it and then close it.
But there is still a problem when more clients want to write in this file....

A common solution is to have a printf-like function, that writes all output first to a buffer, then locks a semaphore, do the actual writing to file, and unlocks the semaphore. If you are worried about the actual writing being slow, you can instead have a queue where all log messages gets inserted, and let another thread take items from the queue and write them to the file, you still have to protect the queue with e.g. a semaphore, but it should be quicker that doing I/O.
As for the actual file, either open it in the main thread and leave it open. Or if you have a special logging thread with queue then let that thread do the opening. Anyway, you don't need to keep opening/closing it every time you want to write something to it, the important part is to protect it from being written to by multiple threads simultaneous.

Just leave it open. Open the log file at server start.

A simple way to avoid badly-interlaced output buffers is to use a separate logging process, connected by a pipe (or a named pipe). The logger just sits blocked on the read() from the pipe, and writes whatever it gets to the file. (the reader's stdin, stdout could actually point to the pipe and the file) The clients just write to the pipe (which can have been dup()d over stderr) Writes to a pipe (upto PIPE_BUF) are guaranteed to be atomic.

Efficient logging solution for multiple threads in C

I have a C program which has multiple worker threads. There is a main thread which periodically (every 0.2s) does some basic checks (i.e. has a thread finished, has a signal been received, etc). At each check, I would like to write to a log file any data that any of the threads may have in their log buffer to a single log file.
My initial idea was to simply open the log file, write the data from all the threads and then close it again. I am worried that this might be too much of an overhead seeing as these checks occur every 0.2s.
So my question is - is this scenario inefficient?
If so, can anyone suggest a better solution?
I thought of leaving the file descriptor open and just writing new data on every check, but then there is the problem if somehow the physical file gets deleted, the program would never know (without rechecking, and in this case we might as well just open the file again) and logging data would be lost.
(This program is designed to run for very long periods of time, so the fact that log file will be deleted at some point is basically guaranteed due to log rotation.)

I thought of leaving the file descriptor open and just writing new data on every check, but then there is the problem if somehow the physical file gets deleted, the program would never know (without rechecking, and in this case we might as well just open the file again) and logging data would be lost.
The standard solution on UNIX is to add a signal handler for SIGHUP which closes and re-opens the log file. Many UNIX daemons do this for precisely this purpose, to support log rotation. Call kill -HUP <pid> in your log rotation script and you're good to go.
(Some programs will also treat SIGHUP as a cue to re-read their configuration files, so you can make configuration changes on the fly without having to restart processes.)

Currently, there isn't much of a good solution. I would suggest to write a timer that runs separately from your main 0.2s check, and checks the logfile buffers and write them to disk.
I am working on something network based that could solve this (I have had the same problem) with excellent performance, fire me a message on github for details.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight