Reading from a file in different threads in C

Reading from a file in different threads in C - c

I have a multithreaded server program where each thread is required to read the contents of a file to retrieve data requested by the client.
I am using pthreads in C to accomplish creating a thread and passing the function the thread which the thread will execute.
In the function, if I assign to a new FILE pointer with fopen() and subsequently read the contents of the file with fgets(), will each thread have its own file offset? That is, if thread 1 is reading from the file, and it is on line 5 of the file when thread 2 reads for the first time, does thread 2 start reading at line 5 or is it independent of where thread 1 is in the file?

Each open FILE has only one file pointer. That has one associated FD, and one file position (file offset as you put it).
But you can fopen the file twice (from two different threads or for that matter from the same thread) - as your edit now implies you are doing. That means you'll have two associated FDs and two separate file positions.
IE, this has nothing to do with threads per se, just that if you want separate file positions, you will need two FDs which (with stdio) means two FILEs.

Related

Synchronization between two processes using semaphores in c

I have a task in which I have to write a program in C language that manages access and reading/writing to a file.
When the program starts it should create two processes(using fork()).
-The first process will be responsible for the initial write to the file(The file is a text file with 2000 random characters from a to z).
-The second process will be responsible for reading from the file ,after the first process has finished writing.
My question is :
How can I synchronize the execution order by using semaphores(sem() call system) in order to ensure that the first process always starts first and the second process starts only after the first process has finished writing?

I can recommend using binarysemaphores:
//
https://www.freertos.org/xSemaphoreCreateBinary.html
https://controllerstech.com/how-to-use-binary-semaphore-in-stm32/
If you are working in an embedded context i would recommend using Tasknotification since they are less ram hungry and therefore may be more fitting in a less powerfull system.
https://www.freertos.org/RTOS-task-notifications.html

C memory mapped file vs regular file

I am working on an implementation where multiple processes read a regular file A. While this is happening, a new process P starts up and copies the contents from A to regular file B. All processes starting up after this, should now read file B.
To make the switch from A to B, process P creates a temporary file T once B is written. All the processes check whether T exists to decide where to read data from (i.e., read from A if T does NOT exist, and read from B if T exists).
Since T file is just an indicator here, is it better to use a memory mapped file rather than regular file for faster performance?

Using a tmp-file for synchronization is not safe. The check, if file exists and reading that file is not atomic. One process could switch the files just after another process has completed the check and is about to read.
If you are developing in C and allowed to use IPC API, you could set the flag in shared memory and guard it by a semaphore.
Also, the processes should signal, that they have finished the reading.

How many independent file pointers/positions exist on Windows?

If I open a single file (using CreateFile()) two times inside a single thread such that I have two valid handles at once, will the two file handles have a shared "file pointer" (SetFilePointer()), or will the two handles have separate independent "file pointers"?
What if there are instead two concurrent threads in one process, and they each hold one handle to the same file. Will those two handles have independent file pointers?

Each time a thread opens a file, a new file object is created with a new set of handle-specific attributes. For example, the current byte offset attribute refers to the location in the file at which the next read or write operation using that handle will occur. Each handle to a file has a private byte offset even though the underlying file is shared. A file object is also unique to a process, except when a process duplicates a file handle to another process (by using the Windows DuplicateHandle function) or when a child process inherits a file handle from a parent process. In these situations, the two processes have separate handles that refer to the same file object.
Windows Internals 5th

Distinct file handles have distinct file pointers, so these scenarios will work without issue (for example, two threads can read from different sections of the same file "concurrently" as long as each uses its own file handle exclusively).

File handles created by distinct calls to CreateFile have independent file pointers. You can use them in separate threads as you intend.
Handles duplicated by DuplicateHandle share a file pointer: don't use it to get a "separate" file handle to be used in another thread.

In linux, is it possible to do partial reads on a regular file

I need to write an application that spits out log entries to a regular file at a very fast rate. Also, there will be another process, that can read the same file concurrently at the time, other process would be writing to it.
I have following questions
How does read() determine EOF, specially in the case, where the underlying file could be concurrently being modified?
Is it possible for read() to return partially written data from the other process write? For example, the write process wrote half a line and read would pick that half line and return?
The application would be written in C on linux 2.6.x using Ex4 filesystem
UPDATE:
Below link points to the patch, that locks inode in EXT4, before reading and writing.
http://patchwork.ozlabs.org/patch/91834/

How does read() determine EOF, specially in the case, where the underlying file could be concurrently being modified?
When you try to read() past the end of the file it returns EOF. You can still seek the file back and forward and read again (only if the file descriptor refers to a file, not a pipe or socket though).
Is it possible for read() to return partially written data from the other process write? For example, the write process wrote half a line and read would pick that half line and return?
Quite possible.

execve() and sharing file descriptors

I read from man pages of execve that if a process(A) call execve, the already opened file descriptors are copied to the new process(B).
Two possiblities arise here :-
1) Does it mean that a new file descriptor table is created for process B, the entries to which are copied from older file descriptor table of process A
2) Or process B gets the file descriptor table of process A as after execve process A will cease to exist and the already opened files could only be closed from process B, if it gets the file descriptor table of process A.
Which one is correct?

execve does not create a new process. It replaces the calling process's program image, memory space, etc. with new ones based on an executable file from the filesystem. The file descriptor table is modified by closing any descriptors with close-on-exec flag set; the rest remain open and in the same state they were in (current position, locks, etc.) prior to execve.
You're probably confusing this with what happens on fork since execve is usually preceded by fork. When a process forks, the child process has a new file descriptor table referring to the same open file descriptions as the parent process's file descriptor table.

Which one is correct?
#2
Though what you ask is more of OS implementation details and that is rarely if ever important to applications, completely transparent to the applications and depends on the OS.
It is normally said that the new process inherits file descriptors. Except those having FD_CLOEXEC flag set, obviously.
Even in case of #1, if we would presume that for some short time both process A and B are in memory (not really, that's fork() territory) copying of the fd table would be OK. Since process A would be terminated (by the exec()) all its file descriptors would be close()d. And that would have no effect on the already-copied file descriptors in the process B. File descriptors are like pointers to the corresponding kernel structure containing actual information about what the file descriptor actually points to. Copying the fd table doesn't make copies of the underlying structures - it copies only the pointers. The kernel structure contains reference counter (required to implement the fork()) which is incremented when copy is made and thus knows how many processes are using it. Calling close() on file descriptor first of all does decrement of the reference counter. And only if the counter goes to zero (no more processes are using the structure) only then OS actually closes the underlying file/socket/pipe/etc. (But obviously even if inside of the kernel two processes are present for some short time simultaneously, user space applications can't see that since the new process after exec() also inherits the PID of the original process.)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight