If I open a single file (using CreateFile()) two times inside a single thread such that I have two valid handles at once, will the two file handles have a shared "file pointer" (SetFilePointer()), or will the two handles have separate independent "file pointers"?
What if there are instead two concurrent threads in one process, and they each hold one handle to the same file. Will those two handles have independent file pointers?
Each time a thread opens a file, a new file object is created with a new set of handle-specific attributes. For example, the current byte offset attribute refers to the location in the file at which the next read or write operation using that handle will occur. Each handle to a file has a private byte offset even though the underlying file is shared. A file object is also unique to a process, except when a process duplicates a file handle to another process (by using the Windows DuplicateHandle function) or when a child process inherits a file handle from a parent process. In these situations, the two processes have separate handles that refer to the same file object.
Windows Internals 5th
Distinct file handles have distinct file pointers, so these scenarios will work without issue (for example, two threads can read from different sections of the same file "concurrently" as long as each uses its own file handle exclusively).
File handles created by distinct calls to CreateFile have independent file pointers. You can use them in separate threads as you intend.
Handles duplicated by DuplicateHandle share a file pointer: don't use it to get a "separate" file handle to be used in another thread.
Related
I am working on an implementation where multiple processes read a regular file A. While this is happening, a new process P starts up and copies the contents from A to regular file B. All processes starting up after this, should now read file B.
To make the switch from A to B, process P creates a temporary file T once B is written. All the processes check whether T exists to decide where to read data from (i.e., read from A if T does NOT exist, and read from B if T exists).
Since T file is just an indicator here, is it better to use a memory mapped file rather than regular file for faster performance?
Using a tmp-file for synchronization is not safe. The check, if file exists and reading that file is not atomic. One process could switch the files just after another process has completed the check and is about to read.
If you are developing in C and allowed to use IPC API, you could set the flag in shared memory and guard it by a semaphore.
Also, the processes should signal, that they have finished the reading.
I have a multithreaded server program where each thread is required to read the contents of a file to retrieve data requested by the client.
I am using pthreads in C to accomplish creating a thread and passing the function the thread which the thread will execute.
In the function, if I assign to a new FILE pointer with fopen() and subsequently read the contents of the file with fgets(), will each thread have its own file offset? That is, if thread 1 is reading from the file, and it is on line 5 of the file when thread 2 reads for the first time, does thread 2 start reading at line 5 or is it independent of where thread 1 is in the file?
Each open FILE has only one file pointer. That has one associated FD, and one file position (file offset as you put it).
But you can fopen the file twice (from two different threads or for that matter from the same thread) - as your edit now implies you are doing. That means you'll have two associated FDs and two separate file positions.
IE, this has nothing to do with threads per se, just that if you want separate file positions, you will need two FDs which (with stdio) means two FILEs.
Unix kernel represents open files using three data structures: Descriptor table, File table, and v-node table.
When a process opens a file twice, it gets two different descriptors in the descriptor table, two entries in the file table(so that they have different positions in the same file), and they both point to one entry in the v-node table.
And child process inherits parent process's descriptor table, so kernel maintains one descriptor table for each process respectively. But two descriptor from different processes point to the same entry in open file table.
So
When child process does some read on the file, would the offset of the same file change in parent process?
If 1 is true, for two processes, is there a convenient way that I can get the same effect of fork on same file? That means two processes share a position(offset) information on the same file.
Is there a way to fork so that both processes have totally unrelated tables, like two unrelated processes only that they opened same files.
When child process does some read on the file, would the offset of the same file change in parent process?
Yes, since the offset is stored system-wide file table. You could get a similar effect using dup or dup2.
If 1 is true, for two processes, is there a convenient way that I can get the same effect of fork on same file? That means two processes share a position(offset) information on the same file.
There is a technique called "passing the file descriptor" using Unix domain sockets. Look for "ancillary" data in sendmsg.
Is there a way to fork so that both processes have totally unrelated tables, like two unrelated processes only that they opened same files.
You have to open the file again to achieve this. Although it doesn't do what you want, you should also look for the FD_CLOEXEC flag.
Hello every one I want to ask a question about flockfile function I was reading the description and came to know that it is used in threads. but I am doing forking which means that there will be different process not threads can I use flockfile with different process does it make any difference?
Thanks
The flockfile function doesn't lock a file but the FILE data structure that a process uses to access a file. So this is about the representation in address space that a process has of the file, not necessarily about the file itself.
Even in a process if you have different FILEs open on the same file, you can write simultaneously to that file, even if you have locked each of the FILEs by means of flockfile.
For locking on the file itself have a look into flock and lockf but beware that the rules of their effects for access files through different threads of the same process are complicated.
These functions can only be used within one process.
From the POSIX docs:
In summary, threads sharing stdio streams with other threads can use flockfile() and funlockfile() to cause sequences of I/O performed by a single thread to be kept bundled.
All the rest of that page talks about mutual exclusion between threads. Different processes will have different input/output buffers for file streams, this locking wouldn't really make sense/be effective.
I'm working on a multi-process program which basically perform fuzzification on each layer of a RVB file. (1 process -> 1 layer). Each child process is delivering a temp file by using the function: tmpfile(). After each child process finishes its job, the main process has to read each temp file created and assemble the data. The problem is that I don't know how to read each temp file inside the main process since I can't access to child's process memory so I can't know what's the temporary pointer to the temp file created!
Any idea?
Don't hesitate to ask for clarifications if needed.
The tmpfile() function returns you a FILE pointer to a file with no determinate name - indeed, even the child process cannot readily determine a name for the file, let alone the parent (and on many Unix systems, the file has no name; it has been unlinked before tmpfile() returns to the caller).
extern FILE *tmpfile(void);
So, you are using the wrong temporary file creation primitive if you must convey file names around.
You have a number of options:
Have the parent process create the file streams with tmpfile() so that both the parent and children share the files. There are some minor coordination issues to handle - the parent will need to seek back to the start before reading what the children wrote, and it should only do that after the child has exited.
Use one of the filename generating primitives instead - mkstemp() is good, and if you need a FILE pointer instead of a file descriptor, you can use fdopen() to create one. You are still faced with the problem of getting file names from children to parent; again, the parent could open the files, or you can use a pipe for each child, or some shared memory, or ... take your pick of IPC mechanisms.
Have the parent open a pipe for each child before forking. The child process closes the read end of the pipe and writes to the the write end; the parent closes the write end of the pipe and arranges to read from the the read end. The issue here with multiple children is that the capacity of any given pipe is finite (and quite small - typically about 5 KiB). Consequently, you need to ensure the parent reads all the pipes completely, bearing in mind that the children won't be able to exit until all the data has been read (strictly, all but the last buffer full has been read).
Consider using threads - but be aware of the coordination issues using threads.
Decide that you do not need to use multiple threads of control - whether processes or threads - but simply have the main program do the work. This eliminates coordination and IPC issues - it does mean you won't benefit from the multi-core processor on the machine.
Of these, assuming parallel execution is of paramount importance, I'd probably use pipes to get the file names from the children (option 2); it has the fewest coordination issues. But for simplicity, I'd go with 'main program does it all' (option 5).
If you call tmpfile() in parent process, child will inherit all open descriptors and will be able to write to the file, and opened file will be accessible for parent as well.
You could create a tempfile in the parent process and then fork, then have the child process use that.
The child process can send back the filedescriptor to the parent process.
EDIT: example code in APUE site (src.tar.gz/apue.2e/lib, recvfd.c, sendfd.c)
Use threads instead of subprocesses? Put the names of the temporary files in another file? Don't use random names for the temp files, but (for example) names based on the pid of the parent process (to allow several instances of your program to run simultaneously) plus a sequential number?