fileno() and files larger than 2gb - c

I am working with huge files. (>>>2gb). the question I have is it safe to use fileno() on the file descriptor if the file is larger than sizeof(int) is?
Here a quick code snippet:
#define _FILE_OFFSET_BITS 64
#include <stdio.h>
#include <inttypes.h>
int readstuff(FILE *fp,uint64_t seekpoint, uint64_t seekwidth) {
int j;
char buf[seekwidth];
if (pread(fileno(fp),buf,seekwidth,seekpoint)!=-1) {
/* do stuf */
return 1;
}
else {
return 2;
}
}
int main() {
FILE *FP;
FP=fopen("/testfile","r");
readstuff(FP,0,10000);
}

The file descriptor returned by fileno() is an int, regardless of the size of the file that it is used to open.

Yes, you can. The value of a file descriptor is unrelated to the size of that file.

fileno(3) gives a file descriptor -which is a fixed small integer once you fopen-ed a stdio FILE stream. A process usually has only dozens of file descriptors, and occasionally (for some servers, see C10K problem) a few dozen of thousands of them.
The kernel is allocating file descriptors (with e.g. open(2) etc...), and gives "small" (generally contiguous) integers
Typically, a file descriptor is a non-negative integer often less than 100, and generally less than 100000. Of course the file descriptor is unrelated to the file size.
Try ls /proc/self/fd/ to list the file descriptors of the process running that ls command and ls /proc/1234/fd/ to list the file descriptors of process of pid 1234.
The command cat /proc/sys/fs/file-max gives the total cumulated maximum number of file descriptors on your system (on mine, it is 1629935 now), and each process has a fraction of that.
You may limit the number of file descriptors a process (and its children) can have with setrlimit(2) using RLIMIT_NOFILE. The bash builtin ulimit calls that syscall (on my system, the descriptors limit is 1024 by default).
Read Advanced Linux Programming.

Related

Why does forking my process cause the file to be read infinitely

I've boiled down my entire program to a short main that replicates the issue, so forgive me for it not making any sense.
input.txt is a text file that has a couple lines of text in it. This boiled down program should print those lines. However, if fork is called, the program enters an infinite loop where it prints the contents of the file over and over again.
As far as I understand fork, the way I use it in this snippet is essentially a no-op. It forks, the parent waits for the child before continuing, and the child is immediately killed.
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
enum { MAX = 100 };
int main(){
freopen("input.txt", "r", stdin);
char s[MAX];
int i = 0;
char* ret = fgets(s, MAX, stdin);
while (ret != NULL) {
//Commenting out this region fixes the issue
int status;
pid_t pid = fork();
if (pid == 0) {
exit(0);
} else {
waitpid(pid, &status, 0);
}
//End region
printf("%s", s);
ret = fgets(s, MAX, stdin);
}
}
Edit: Further investigation has only made my issue stranger. If the file contains <4 blank lines or <3 lines of text, it does not break. However, if there are more than that, it loops infinitely.
Edit2: If the file contains numbers 3 lines of numbers it will infinitely loop, but if it contains 3 lines of words it will not.
I am surprised that there is a problem, but it does seem to be a problem on Linux (I tested on Ubuntu 16.04 LTS running in a VMWare Fusion VM on my Mac) — but it was not a problem on my Mac running macOS 10.13.4 (High Sierra), and I wouldn't expect it to be a problem on other variants of Unix either.
As I noted in a comment:
There's an open file description and an open file descriptor behind each stream. When the process forks, the child has its own set of open file descriptors (and file streams), but each file descriptor in the child shares the open file description with the parent. IF (and that's a big 'if') the child process closing the file descriptors first did the equivalent of lseek(fd, 0, SEEK_SET), then that would also position the file descriptor for the parent process, and that could lead to an infinite loop. However, I've never heard of a library that does that seek; there's no reason to do it.
See POSIX open() and fork() for more information about open file descriptors and open file descriptions.
The open file descriptors are private to a process; the open file descriptions are shared by all copies of the file descriptor created by an initial 'open file' operation. One of the key properties of the open file description is the current seek position. That means that a child process can change the current seek position for a parent — because it is in the shared open file description.
neof97.c
I used the following code — a mildly adapted version of the original that compiles cleanly with rigorous compilation options:
#include "posixver.h"
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
enum { MAX = 100 };
int main(void)
{
if (freopen("input.txt", "r", stdin) == 0)
return 1;
char s[MAX];
for (int i = 0; i < 30 && fgets(s, MAX, stdin) != NULL; i++)
{
// Commenting out this region fixes the issue
int status;
pid_t pid = fork();
if (pid == 0)
{
exit(0);
}
else
{
waitpid(pid, &status, 0);
}
// End region
printf("%s", s);
}
return 0;
}
One of the modifications limits the number of cycles (children) to just 30.
I used a data file with 4 lines of 20 random letters plus a newline (84 bytes total):
ywYaGKiRtAwzaBbuzvNb
eRsjPoBaIdxZZtJWfSty
uGnxGhSluywhlAEBIXNP
plRXLszVvPgZhAdTLlYe
I ran the command under strace on Ubuntu:
$ strace -ff -o st-out -- neof97
ywYaGKiRtAwzaBbuzvNb
eRsjPoBaIdxZZtJWfSty
uGnxGhSluywhlAEBIXNP
plRXLszVvPgZhAdTLlYe
…
uGnxGhSluywhlAEBIXNP
plRXLszVvPgZhAdTLlYe
ywYaGKiRtAwzaBbuzvNb
eRsjPoBaIdxZZtJWfSty
$
There were 31 files with names of the form st-out.808## where the hashes were 2-digit numbers. The main process file was quite large; the others were small, with one of the sizes 66, 110, 111, or 137:
$ cat st-out.80833
lseek(0, -63, SEEK_CUR) = 21
exit_group(0) = ?
+++ exited with 0 +++
$ cat st-out.80834
lseek(0, -42, SEEK_CUR) = -1 EINVAL (Invalid argument)
exit_group(0) = ?
+++ exited with 0 +++
$ cat st-out.80835
lseek(0, -21, SEEK_CUR) = 0
exit_group(0) = ?
+++ exited with 0 +++
$ cat st-out.80836
exit_group(0) = ?
+++ exited with 0 +++
$
It just so happened that the first 4 children each exhibited one of the four behaviours — and each further set of 4 children exhibited the same pattern.
This shows that three out of four of the children were indeed doing an lseek() on standard input before exiting. Obviously, I have now seen a library do it. I have no idea why it is thought to be a good idea, though, but empirically, that is what is happening.
neof67.c
This version of the code, using a separate file stream (and file descriptor) and fopen() instead of freopen() also runs into the problem.
#include "posixver.h"
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
enum { MAX = 100 };
int main(void)
{
FILE *fp = fopen("input.txt", "r");
if (fp == 0)
return 1;
char s[MAX];
for (int i = 0; i < 30 && fgets(s, MAX, fp) != NULL; i++)
{
// Commenting out this region fixes the issue
int status;
pid_t pid = fork();
if (pid == 0)
{
exit(0);
}
else
{
waitpid(pid, &status, 0);
}
// End region
printf("%s", s);
}
return 0;
}
This also exhibits the same behaviour, except that the file descriptor on which the seek occurs is 3 instead of 0. So, two of my hypotheses are disproven — it's related to freopen() and stdin; both are shown incorrect by the second test code.
Preliminary diagnosis
IMO, this is a bug. You should not be able to run into this problem.
It is most likely a bug in the Linux (GNU C) library rather than the kernel. It is caused by the lseek() in the child processes. It is not clear (because I've not gone to look at the source code) what the library is doing or why.
GLIBC Bug 23151
GLIBC Bug 23151 - A forked process with unclosed file does lseek before exit and can cause infinite loop in parent I/O.
The bug was created 2018-05-08 US/Pacific, and was closed as INVALID by 2018-05-09. The reason given was:
Please read
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_05_01,
especially this paragraph:
Note that after a fork(), two handles exist where one existed before. […]
POSIX
The complete section of POSIX referred to (apart from verbiage noting that this is not covered by the C standard) is this:
2.5.1 Interaction of File Descriptors and Standard I/O Streams
An open file description may be accessed through a file descriptor, which is created using functions such as open() or pipe(), or through a stream, which is created using functions such as fopen() or popen(). Either a file descriptor or a stream is called a "handle" on the open file description to which it refers; an open file description may have several handles.
Handles can be created or destroyed by explicit user action, without affecting the underlying open file description. Some of the ways to create them include fcntl(), dup(), fdopen(), fileno(), and fork(). They can be destroyed by at least fclose(), close(), and the exec functions.
A file descriptor that is never used in an operation that could affect the file offset (for example, read(), write(), or lseek()) is not considered a handle for this discussion, but could give rise to one (for example, as a consequence of fdopen(), dup(), or fork()). This exception does not include the file descriptor underlying a stream, whether created with fopen() or fdopen(), so long as it is not used directly by the application to affect the file offset. The read() and write() functions implicitly affect the file offset; lseek() explicitly affects it.
The result of function calls involving any one handle (the "active handle") is defined elsewhere in this volume of POSIX.1-2017, but if two or more handles are used, and any one of them is a stream, the application shall ensure that their actions are coordinated as described below. If this is not done, the result is undefined.
A handle which is a stream is considered to be closed when either an fclose(), or freopen() with non-full(1) filename, is executed on it (for freopen() with a null filename, it is implementation-defined whether a new handle is created or the existing one reused), or when the process owning that stream terminates with exit(), abort(), or due to a signal. A file descriptor is closed by close(), _exit(), or the exec() functions when FD_CLOEXEC is set on that file descriptor.
(1) [sic] Using 'non-full' is probably a typo for 'non-null'.
For a handle to become the active handle, the application shall ensure that the actions below are performed between the last use of the handle (the current active handle) and the first use of the second handle (the future active handle). The second handle then becomes the active handle. All activity by the application affecting the file offset on the first handle shall be suspended until it again becomes the active file handle. (If a stream function has as an underlying function one that affects the file offset, the stream function shall be considered to affect the file offset.)
The handles need not be in the same process for these rules to apply.
Note that after a fork(), two handles exist where one existed before. The application shall ensure that, if both handles can ever be accessed, they are both in a state where the other could become the active handle first. The application shall prepare for a fork() exactly as if it were a change of active handle. (If the only action performed by one of the processes is one of the exec() functions or _exit() (not exit()), the handle is never accessed in that process.)
For the first handle, the first applicable condition below applies. After the actions required below are taken, if the handle is still open, the application can close it.
If it is a file descriptor, no action is required.
If the only further action to be performed on any handle to this open file descriptor is to close it, no action need be taken.
If it is a stream which is unbuffered, no action need be taken.
If it is a stream which is line buffered, and the last byte written to the stream was a <newline> (that is, as if a
putc('\n')
was the most recent operation on that stream), no action need be taken.
If it is a stream which is open for writing or appending (but not also open for reading), the application shall either perform an fflush(), or the stream shall be closed.
If the stream is open for reading and it is at the end of the file (feof() is true), no action need be taken.
If the stream is open with a mode that allows reading and the underlying open file description refers to a device that is capable of seeking, the application shall either perform an fflush(), or the stream shall be closed.
For the second handle:
If any previous active handle has been used by a function that explicitly changed the file offset, except as required above for the first handle, the application shall perform an lseek() or fseek() (as appropriate to the type of handle) to an appropriate location.
If the active handle ceases to be accessible before the requirements on the first handle, above, have been met, the state of the open file description becomes undefined. This might occur during functions such as a fork() or _exit().
The exec() functions make inaccessible all streams that are open at the time they are called, independent of which streams or file descriptors may be available to the new process image.
When these rules are followed, regardless of the sequence of handles used, implementations shall ensure that an application, even one consisting of several processes, shall yield correct results: no data shall be lost or duplicated when writing, and all data shall be written in order, except as requested by seeks. It is implementation-defined whether, and under what conditions, all input is seen exactly once.
Each function that operates on a stream is said to have zero or more "underlying functions". This means that the stream function shares certain traits with the underlying functions, but does not require that there be any relation between the implementations of the stream function and its underlying functions.
Exegesis
That is hard reading! If you're not clear on the distinction between open file descriptor and open file description, read the specification of open() and fork() (and dup() or dup2()). The definitions for file descriptor and open file description are also relevant, if terse.
In the context of the code in this question (and also for Unwanted child processes being created while file reading), we have a file stream handle open for reading only which has not yet encountered EOF (so feof() would not return true, even though the read position is at the end of the file).
One of the crucial parts of the specification is: The application shall prepare for a fork() exactly as if it were a change of active handle.
This means that the steps outlined for 'first file handle' are relevant, and stepping through them, the first applicable condition is the last:
If the stream is open with a mode that allows reading and the underlying open file description refers to a device that is capable of seeking, the application shall either perform an fflush(), or the stream shall be closed.
If you look at the definition for fflush(), you find:
If stream points to an output stream or an update stream in which the most recent operation was not input, fflush() shall cause any unwritten data for that stream to be written to the file, [CX] ⌦ and the last data modification and last file status change timestamps of the underlying file shall be marked for update.
For a stream open for reading with an underlying file description, if the file is not already at EOF, and the file is one capable of seeking, the file offset of the underlying open file description shall be set to the file position of the stream, and any characters pushed back onto the stream by ungetc() or ungetwc() that have not subsequently been read from the stream shall be discarded (without further changing the file offset). ⌫
It isn't exactly clear what happens if you apply fflush() to an input stream associated with a non-seekable file, but that isn't our immediate concern. However, if you're writing generic library code, then you might need to know whether the underlying file descriptor is seekable before doing a fflush() on the stream. Alternatively, use fflush(NULL) to have the system do whatever is necessary for all I/O streams, noting that this will lose any pushed-back characters (via ungetc() etc).
The lseek() operations shown in the strace output seem to be implementing the fflush() semantics associating the file offset of the open file description with the file position of the stream.
So, for the code in this question, it seems that fflush(stdin) is necessary before the fork() to ensure consistency. Not doing that leads to undefined behaviour ('if this is not done, the result is undefined') — such as looping indefinitely.
The exit() call closes all open file handles. After the fork, the child and parent have identical copies of the execution stack, including the FileHandle pointer. When the child exits, it closes the file and resets the pointer.
int main(){
freopen("input.txt", "r", stdin);
char s[MAX];
prompt(s);
int i = 0;
char* ret = fgets(s, MAX, stdin);
while (ret != NULL) {
//Commenting out this region fixes the issue
int status;
pid_t pid = fork(); // At this point both processes has a copy of the filehandle
if (pid == 0) {
exit(0); // At this point the child closes the filehandle
} else {
waitpid(pid, &status, 0);
}
//End region
printf("%s", s);
ret = fgets(s, MAX, stdin);
}
}
As /u/visibleman pointed out, the child thread is closing the file and messing things up in main.
I was able to work around it by checking if the program is in terminal mode with
!isatty(fileno(stdin))
And if stdin has been redirected, then it will read all of it into a linkedlist before doing any processing or forking.
Replace exit(0) with _exit(0), and all is fine. This is an old unix tradition, if you are using stdio, your forked image must use _exit(), not exit().

How do you programmatically create a completely empty sparse file on linux?

If you run dd with this:
dd if=/dev/zero of=sparsefile bs=1 count=0 seek=1048576
You appear to get a completely unallocated sparse file (this is ext4)
smark#we:/sp$ ls -ls sparsefile
0 -rw-rw-r-- 1 smark smark 1048576 Nov 24 16:19 sparsefile
fibmap agrees:
smark#we:/sp$ sudo hdparm --fibmap sparsefile
sparsefile:
filesystem blocksize 4096, begins at LBA 2048; assuming 512 byte sectors.
byte_offset begin_LBA end_LBA sectors
Without having to dig through the source of dd, I'm trying to figure out how to do that in C.
I tried fseeking and fwriting zero bytes, but it did nothing.
Not sure what else to try, I figured somebody might know before I hunt down dd's innards.
EDIT: including my example...
FILE *f = fopen("/sp/sparse2", "wb");
fseek(f, 1048576, SEEK_CUR);
fwrite("x", 1, 0, f);
fclose(f);
When you write to a file using write or various library routines that ultimately call write, there's a file offset pointer associated with the file descriptor that determines where in the file the bytes will go. It's normally positioned at the end of the data that was processed by the most recent call to read or write. But you can use lseek to position the pointer anywhere within the file, and even beyond the current end of the file. When you write data at a point beyond the current EOF, the area that was skipped is conceptually filled with zeroes. Many systems will optimize things so that any whole filesystem blocks in that skipped area simply aren't allocated, producing a sparse file. Attempts to read such blocks will succeed, returning zeroes.
Writing block-sized areas full of zeroes to a file generally won't produce a sparse file, although it's possible for some filesystems to do this.
Another way to produce a sparse file, used by GNU dd, is to call ftruncate. The documentation says this:
The ftruncate() function causes the regular file referenced by fildes to have a size of length bytes.
If the file previously was larger than length, the extra data is discarded. If it was previously shorter than length, it is unspecified whether the file is changed or its size increased. If the file is extended, the extended area appears as if it were zero-filled.
Support for sparse files is filesystem-specific, although virtually all designed-for-UNIX local filesystems support them.
This is complementary to the answer by #MarkPlotnick, it's a sample simple implementation of the feature you requested using ftruncate():
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
int
main(void)
{
int file;
int mode;
mode = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH;
file = open("sparsefile", O_WRONLY | O_CREAT, mode);
if (file == -1)
return -1;
ftruncate(file, 0x100000);
close(file);
return 0;
}

What is the correct approach to write multiple small pieces to a temp file in c, in multithreads?

I am simulating multithreads file downloading. My strategy is in each thread would receive small file pieces( each file piece has piece_length and piece_size and start_writing_pos )
And then each thread writes to the same buffer. How do I realize it ? Do I have to worry about collisions ?
//=================== follow up ============//
so I write a small demo as follows:
#include <stdio.h>
int main(){
char* tempfilePath = "./testing";
FILE *fp;
fp = fopen(tempfilePath,"w+");//w+: for reading and writing
fseek( fp, 9, SEEK_SET);//starting in 10-th bytes
fwrite("----------",sizeof(char), 10, fp);
fclose(fp);
}
And before execution I let content in "./testing" to be "XXXXXXXXXXXXXXXXXXX", after I do the above I get "^#^#^#^#^#^#^#^#^#----------" I wonder where is the problem then ....
Do what most torrent clients do. Create a file with the final size having an extension .part. Then allocate non-overlapping parts of the file to each thread, who shall have their own file-descriptors. Thus collisions are avoided. Rename to final name when finished.
Unless you want to use a mutex, you can't use fwrite(). FILE *-based IO using fopen(), fwrite(), and all related functions simply isn't reentrant - the FILE uses a SINGLE buffer., a SINGLE offset, etc.
You can't even use open() and lseek()/write() - multiple threads will interfere with each other, modifying the one offset an open file descriptor has.
Use open() to open the file, and use pwrite() to write data to exact offsets.
pwrite() man page:
pwrite() writes up to count bytes from the buffer starting at buf to
the file descriptor fd at offset offset. The file offset is not
changed.

read function in c only reading 3072 bytes

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include<stdio.h>
int main(){
int fd;
char bf[4096];
int buf_size=4096;
fd = open("/proc/18022/cmdline", O_RDONLY);
int bytes = read(fd, bf, buf_size-1);
printf("%d %s\n\n",bytes,bf);
close(fd);
}
Above code is always reading only 3072 bytes while cmdline have more characters than 3072.
If i copy content of cmdline to gedit and then run the above code on this newly created file then it is reading all bytes of the file.
I googled it and found that it reads bytes up to SSIZE_MAX but my doubt is why it is reading all the bytes in second case.
You should not rely on reading a whole file from the first try, even if you know you've allocated enough space for the read. Instead you should read in chunks and process the bytes chunk-by-chunk:
char buff[4096];
while((cnt = read(fd, bf, buf_size-1)) > 0) {
// process the bytes just read, or append them to
// a larger buffer
}
Quoting from the man page for read():
It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal.
For /proc files, we can see here that:
The most distinctive thing about files in this directory is the fact that all of them have a file size of 0, with the exception of kcore, mtrr and self.
and
You might wonder how you can see details of a process that has a file size of 0. It makes more sense if you think of it as a window into the kernel. The file doesn't actually contain any data; it just acts as a pointer to where the actual process information resides.
which means that the contents of those pseudo-files are sent by the kernel, in batches as large as the kernel wants. This looks very similar to a pipe, where a produces writes data and a consumer reads it, each of them operating at different speeds.

C: multi-processes stdio append mode

I wrote this code in C:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
void random_seed(){
struct timeval tim;
gettimeofday(&tim, NULL);
double t1=tim.tv_sec+(tim.tv_usec/1000000.0);
srand (t1);
}
void main(){
FILE *f;
int i;
int size=100;
char *buf=(char*)malloc(size);
f = fopen("output.txt", "a");
setvbuf (f, buf, _IOFBF, size);
random_seed();
for(i=0; i<200; i++){
fprintf(f, "[ xx - %d - 012345678901234567890123456789 - %d]\n", rand()%10, getpid());
fflush(f);
}
fclose(f);
free(buf);
}
This code opens in append mode a file and attaches 200 times a string.
I set the buf of size 100 that can contains the full string.
Then I created multi processes running this code by using this bash script:
#!/bin/bash
gcc source.c
rm output.txt
for i in `seq 1 100`;
do
./a.out &
done
I expected that in the output the strings are never mixed up, as I read that when opening a file with O_APPEND flag the file offset will be set to the end of the file prior to each write and i'm using a fully buffered stream, but i got the first line of each process is mixed as this:
[ xx - [ xx - 7 - 012345678901234567890123456789 - 22545]
and some lines later
2 - 012345678901234567890123456789 - 22589]
It looks like the write is interrupted for calling the rand function.
So...why appear these lines?
Is the only way to prevent this the use file locks...even if i'm using only the append mode?
Thanks in advance!
You will need to implement some form of concurrency control yourself, POSIX makes no guarantees with respect to concurrent writes from multiple processes. You get some guarantees for pipes, but not for regular files written to from different processes.
Quoting POSIX write():
This volume of POSIX.1-2008 does not specify behavior of concurrent writes to a file from multiple processes. Applications should use some form of concurrency control.
(At the end of the Rationale section.)
You open the file in the fully buffered mode. That means that every line of the output first goes into the buffer and when the buffer overflows it gets flushed to the file regardless whether it contains incomplete lines. That causes chunks of output from different processes writing into the same file concurrently to be interleaved.
An easy fix would be to open the file in line buffered mode _IOLBF, so that the buffer gets flushed on each complete line. Just make sure that the buffer size is at least as big as your longest line, otherwise it will end up writing incomplete lines. The buffer is normally flushed with a single write() system call, so that lines from different processes won't interleave each other.
There is no guarantee that write() system call is atomic for different filesystems though, but it normally works as expected because write() normally locks the file descriptor in the kernel with a mutex before proceeding.

Resources