GOAL: write in the trace_pipe only if openat is called with O_RDONLY flag. I've build the struct looking the format contained here /sys/kernel/debug/tracing/events/syscalls/sys_enter_open/format
PROBLEM I think I'm not accessing to the flags field because it looks like that the second if statement is always false.
QUESTION: am I correctly accessing to the flags fields? Is there a way to print flags variable content?
struct syscalls_enter_openat_args {
__u64 pad;
int __syscall_nr;
const char * filename;
int flags;
unsigned short modep;
};
SEC("tracepoint/syscalls/sys_enter_openat")
int bpf_sys(struct syscalls_enter_openat_args *ctx)
{
char fmt[] = "llo\n";
int flags = ctx->flags;
if (flags){
if (flags == O_RDONLY)
bpf_trace_printk(fmt, sizeof(fmt));
}
return 0;
}
char _license[] SEC("license") = "GPL";
So you mention that the following check always evaluates to false:
if (flags == O_RDONLY)
This may be because there are more flags than just O_RDONLY that are passed to openat() through the variable flags. From the openat() manual page:
The argument flags must include one of the following access modes: O_RDONLY, O_WRONLY, or O_RDWR. These request opening the file read-only, write-only, or read/write, respectively.
In addition, zero or more file creation flags and file status flags can be bitwise-or'd in flags. The file creation flags are O_CLOEXEC, O_CREAT, O_DIRECTORY, O_EXCL, O_NOCTTY, O_NOFOLLOW, O_TMPFILE, and O_TRUNC. The file status flags are all of the remaining flags listed below. The distinction between these two groups of flags is that the file creation flags affect the semantics of the open operation itself, while the file status flags affect the semantics of subsequent I/O operations. The file status flags can be retrieved and (in some cases) modified; see fcntl(2) for details.
So instead of checking if your flags are equal to O_RDONLY, you might want to check if they include the flag, by bit-masking it like this:
if (flags & O_RDONLY)
As for printing the value of flags, it is probably doable with something like this (not tested):
char fmt[] = "flags: %x\n";
int flags = ctx->flags;
if (flags & O_RDONLY)
bpf_trace_printk(fmt, sizeof(fmt), flags);
I am starting serial port programming in Linux. After reading several examples on the web, I don't understand exact effect of fcntl(fd, F_SETFL, 0)? It is clearing bits, but what flags does it affect? What does it set and or clear?
Take one by one
1) Function call used
fcntl() - It perform the operation on file descriptor passed in argument.
2) 2nd argument in call
F_SETFL (int)
Set the file status flags to the value specified by arg. File access
mode (O_RDONLY, O_WRONLY, O_RDWR) and file creation flags (i.e.,
O_CREAT, O_EXCL, O_NOCTTY, O_TRUNC) in arg are ignored. On Linux this
command can change only the O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME,
and O_NONBLOCK flags.
3) 3rd argument in call
It is 0 means, It set file status flag to zero.
As Jean-Baptiste Yunès said in comment.
file access mode and file creation flags are ignored. This command
reset every other flags: no append, no async, no direct, no atime, and
no nonblocking
So finally
fcntl(fd, F_SETFL, 0)
This call will set opened file desciptor's file status flag to value 0.
But idealy this way we should not change file status flag.
Best way is to first get the current file status flag using F_GETFL and then just change required bit in that.
See example:
If you want to modify the file status flags, you should get the current flags with F_GETFL and modify the value. Don’t assume that the flags listed here are the only ones that are implemented; your program may be run years from now and more flags may exist then. For example, here is a function to set or clear the flag O_NONBLOCK without altering any other flags:
/* Set the O_NONBLOCK flag of desc if value is nonzero,
or clear the flag if value is 0.
Return 0 on success, or -1 on error with errno set. */
int
set_nonblock_flag (int desc, int value)
{
int oldflags = fcntl (desc, F_GETFL, 0);
/* If reading the flags failed, return error indication now. */
if (oldflags == -1)
return -1;
/* Set just the flag we want to set. */
if (value != 0)
oldflags |= O_NONBLOCK;
else
oldflags &= ~O_NONBLOCK;
/* Store modified flag word in the descriptor. */
return fcntl (desc, F_SETFL, oldflags);
}
per the man page for fcntl()
F_SETFL (int)
Set the file status flags to the value specified by arg. File
access mode (O_RDONLY, O_WRONLY, O_RDWR) and file creation
flags (i.e., O_CREAT, O_EXCL, O_NOCTTY, O_TRUNC) in arg are
ignored. On Linux this command can change only the O_APPEND,
O_ASYNC, O_DIRECT, O_NOATIME, and O_NONBLOCK flags. It is not
possible to change the O_DSYNC and O_SYNC flags; see BUGS,
below.
This is from the man page.
While trying to use fcntl() with command F_GETFL and F_SETFL, I got some questions:
Why the flag returned by fcntl(fd, F_GETFL) only include a subset of bits of what I set when open file? Does it only show the ones that are modifiable?
When use fcntl(fd, F_SETFL, flag), how should I pass the flag param, do I need to read flag via fcntl(fd, F_GETFL) first, then modify it and pass it? Or internally it just do a bit & operation with the new param?
Where can I find a full list of the 32 (or less) bits of open file flags?
Code - [dup_fd_share.c]:
// prove duplicated fd shared file offset and open file status,
// TLPI exercise 5.5
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#define BUF_SIZE 100
void fd_share() {
char *fp = "/tmp/fd_share.txt";
char *buf = "abc\n";
int write_size = 4;
int fd, fd2;
off_t cur, cur2;
int open_flag, open_flag2;
// open
int flag = O_RDWR | O_CREAT | O_TRUNC | O_APPEND;
printf("file flag param: %x\n", flag);
fd = open(fp, flag, 0644);
// dup
fd2 = dup(fd);
// initial offset
cur = lseek(fd, 0, SEEK_CUR);
printf("fd[%d] offset: %ld\n", fd, cur);
cur2= lseek(fd2, 0, SEEK_CUR);
printf("fd[%d] offset: %ld\n", fd2, cur2);
// write, offset change,
write(fd, buf, 4);
printf("write %d chars\n", write_size);
// new offset
cur = lseek(fd, 0, SEEK_CUR);
printf("fd[%d] offset: %ld\n", fd, cur);
cur2= lseek(fd2, 0, SEEK_CUR);
printf("fd[%d] offset: %ld\n", fd2, cur2);
// get original open file flag,
open_flag = fcntl(fd, F_GETFL);
printf("fd[%d] open flag: %x\n", fd, open_flag);
open_flag2 = fcntl(fd2, F_GETFL);
printf("fd[%d] open flag: %x\n", fd2, open_flag2);
// change open file flag,
open_flag &= ~O_APPEND;
if(fcntl(fd, F_SETFL, open_flag) == -1) {
printf("failed to set flag\n");
return;
}
printf("change open file flag, remove %s\n", "O_APPEND");
open_flag = fcntl(fd, F_GETFL);
printf("fd[%d] open flag: %x\n", fd, open_flag);
open_flag2 = fcntl(fd2, F_GETFL);
printf("fd[%d] open flag: %x\n", fd2, open_flag2);
close(fd);
}
int main(int argc, char *argv[]) {
fd_share();
return 0;
}
Output:
file flag param: 642
fd[3] offset: 0
fd[4] offset: 0
write 4 chars
fd[3] offset: 4
fd[4] offset: 4
fd[3] open flag: 402
fd[4] open flag: 402
change open file flag, remove O_APPEND
fd[3] open flag: 2
fd[4] open flag: 2
You asked:
Why the flag returned by fcntl(fd, F_GETFL) only include a subset of bits of what I set when open file? Does it only show the ones that are modifiable?
No; it only shows the ones that are "remembered" by the system, such as O_RDWR. These can really be called "flags". Some of the other bits ORed into the oflag parameter are more like "imperative instructions" to the open system call: for example, O_CREAT says "please create this file if it doesn't exist" and O_TRUNC says "please truncate it", neither of which are "flags". A file that was truncated on creation is indistinguishable from a file that was not truncated on creation: they're both just "files". So after open is done creating or truncating the file, it doesn't bother to "remember" that history. It only "remembers" important things, like whether the file is open for reading or writing.
Edited to add: These different kinds of flags have semi-official names. O_RDWR is the "access mode" (remembered, non-modifiable); O_APPEND is an "operating mode" (remembered, usually modifiable); O_TRUNC is an "open-time flag" (pertains to the open operation itself, not to the file descriptor; therefore not remembered). Notice that the "access mode" is not modifiable — you can't use fcntl to turn a read-only fd into a write-only fd.
When use fcntl(fd, F_SETFL, flag), how should I pass the flag param, do I need to read flag via fcntl(fd, F_GETFL) first, then modify it and pass it? Or internally it just do a bit & operation with the new param?
F_SETFL overwrites the flags with exactly what you pass in (although it will ignore your puny attempts to set bits-that-aren't-really-flags, such as O_TRUNC). IF you want to set a specific flag and leave the other flags as-is, then you must F_GETFL the old flags, | the new flag in, and then F_SETFL the result. This must be done as two separate system calls; there is no atomic or thread-safe way to accomplish it as far as I know.
Where can I find a full list of the 32 (or less) bits of open file flags?
In fcntl.h or its documentation (man fcntl). For example, on my MacBook the man page says:
The flags for the F_GETFL and F_SETFL commands are as follows:
O_NONBLOCK Non-blocking I/O; if no data is available to a read call, or if a write operation would block, the read or write
call returns -1 with the error EAGAIN.
O_APPEND Force each write to append at the end of file; corresponds to the O_APPEND flag of open(2).
O_ASYNC Enable the SIGIO signal to be sent to the process group when I/O is possible, e.g., upon availability of data to be
read.
In other words, there are exactly three bits you can set (or unset) on OS X. Whereas on Linux, the man page says this:
On Linux this command can change only the O_APPEND, O_ASYNC,
O_DIRECT, O_NOATIME, and O_NONBLOCK flags.
Incidentally, some Linux filesystems have the concept of an "append-only file" at the filesystem level; if you open one of those files and then try to clear the resulting descriptor's O_APPEND flag, you'll get an EPERM error. The "append-only"-ness of a file can be controlled at the filesystem level using the chattr utility.
Here's a more systematic version of your test program. It might not be of interest to you, but I learned something by writing it, so I'm leaving it here. :)
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
int main() {
int fd = open("/tmp/fd_share.txt", O_RDWR | O_CREAT | O_TRUNC | O_APPEND, 0644);
// append to empty file
write(fd, "aaaaaaaaaa", 10);
off_t cur = lseek(fd, 1, SEEK_SET);
printf("offset after being set to 1: %ld\n", (long)cur);
// append
write(fd, "bbbbbbbb", 8);
cur = lseek(fd, 0, SEEK_CUR);
printf("offset after appending bbbbbbbb: %ld\n", (long)cur);
cur = lseek(fd, 2, SEEK_SET);
printf("offset after being set to 2: %ld\n", (long)cur);
// now toggle "append mode" to FALSE
int open_flag = fcntl(fd, F_GETFL);
if (fcntl(fd, F_SETFL, open_flag & ~O_APPEND) == -1) {
printf("failed to set flag\n");
return 0;
}
cur = lseek(fd, 0, SEEK_CUR);
printf("offset after unsetting O_APPEND: %ld\n", (long)cur);
cur = lseek(fd, 3, SEEK_SET);
printf("offset after being set to 3: %ld\n", (long)cur);
// write without appending
write(fd, "cccc", 4);
cur = lseek(fd, 0, SEEK_CUR);
printf("offset after writing cccc: %ld\n", (long)cur);
// now toggle "append mode" to TRUE
open_flag = fcntl(fd, F_GETFL);
if (fcntl(fd, F_SETFL, open_flag | O_APPEND) == -1) {
printf("failed to set flag\n");
return 0;
}
cur = lseek(fd, 0, SEEK_CUR);
printf("offset after unsetting O_APPEND: %ld\n", (long)cur);
// append
write(fd, "dd", 2);
cur = lseek(fd, 0, SEEK_CUR);
printf("offset after appending dd: %ld\n", (long)cur);
close(fd);
}
The output of this program on my MacBook (as it should be on any POSIX system AFAIK) is:
offset after being set to 1: 1
offset after appending bbbbbbbb: 18
offset after being set to 2: 2
offset after unsetting O_APPEND: 2
offset after being set to 3: 3
offset after writing cccc: 7
offset after unsetting O_APPEND: 7
offset after appending dd: 20
1) The return of fcnl is a code that described if the function succceded and how:
RETURN VALUE
For a successful call, the return value depends on the operation:
F_DUPFD The new descriptor.
F_GETFD Value of file descriptor flags.
F_GETFL Value of file status flags.
F_GETLEASE
Type of lease held on file descriptor.
F_GETOWN Value of descriptor owner.
F_GETSIG Value of signal sent when read or write becomes possible, or
zero for traditional SIGIO behavior.
F_GETPIPE_SZ, F_SETPIPE_SZ
The pipe capacity.
F_GET_SEALS
A bit mask identifying the seals that have been set for the
inode referred to by fd.
All other commands
Zero.
On error, -1 is returned, and errno is set appropriately.
ERRORS
EACCES or EAGAIN
Operation is prohibited by locks held by other processes.
EAGAIN The operation is prohibited because the file has been memory-
mapped by another process.
EBADF fd is not an open file descriptor
EBADF cmd is F_SETLK or F_SETLKW and the file descriptor open mode
doesn't match with the type of lock requested.
EBUSY cmd is F_SETPIPE_SZ and the new pipe capacity specified in arg
is smaller than the amount of buffer space currently used to
store data in the pipe.
EBUSY cmd is F_ADD_SEALS, arg includes F_SEAL_WRITE, and there
exists a writable, shared mapping on the file referred to by
fd.
EDEADLK
It was detected that the specified F_SETLKW command would
cause a deadlock.
EFAULT lock is outside your accessible address space.
EINTR cmd is F_SETLKW or F_OFD_SETLKW and the operation was
interrupted by a signal; see signal(7).
EINTR cmd is F_GETLK, F_SETLK, F_OFD_GETLK, or F_OFD_SETLK, and the
operation was interrupted by a signal before the lock was
checked or acquired. Most likely when locking a remote file
(e.g., locking over NFS), but can sometimes happen locally.
EINVAL The value specified in cmd is not recognized by this kernel.
EINVAL cmd is F_ADD_SEALS and arg includes an unrecognized sealing
bit.
EINVAL cmd is F_ADD_SEALS or F_GET_SEALS and the filesystem
containing the inode referred to by fd does not support
sealing.
EINVAL cmd is F_DUPFD and arg is negative or is greater than the
maximum allowable value (see the discussion of RLIMIT_NOFILE
in getrlimit(2)).
EINVAL cmd is F_SETSIG and arg is not an allowable signal number.
EINVAL cmd is F_OFD_SETLK, F_OFD_SETLKW, or F_OFD_GETLK, and l_pid
was not specified as zero.
EMFILE cmd is F_DUPFD and the process already has the maximum number
of file descriptors open.
ENOLCK Too many segment locks open, lock table is full, or a remote
locking protocol failed (e.g., locking over NFS).
ENOTDIR
F_NOTIFY was specified in cmd, but fd does not refer to a
directory.
EPERM Attempted to clear the O_APPEND flag on a file that has the
append-only attribute set.
EPERM cmd was F_ADD_SEALS, but fd was not open for writing or the
current set of seals on the file already includes F_SEAL_SEAL.
2) Flags to be set is your choice: :
F_SETFL (int)
Set the file status flags to the value specified by arg. File
access mode (O_RDONLY, O_WRONLY, O_RDWR) and file creation
flags (i.e., O_CREAT, O_EXCL, O_NOCTTY, O_TRUNC) in arg are
ignored. On Linux this command can change only the O_APPEND,
O_ASYNC, O_DIRECT, O_NOATIME, and O_NONBLOCK flags. It is not
possible to change the O_DSYNC and O_SYNC flags; see BUGS,
below.
3) HERE you have a complete description.
I have a sample program:
int main()
{
const char* fn = "/tmp/tmpfifo";
int i = mkfifo(fn, 0666);
int fd = open(fn, O_RDONLY | O_NONBLOCK);
int flags = fcntl(fd, F_GETFL);
flags &= ~O_NONBLOCK;
fcntl(fd, F_SETFL, flags);
char buf[1024];
int rd= read(fd, buf, 100);
cout << rd << endl;
remove(fn);
return 0;
}
It seems that after removing the non-blocking flag from the file descriptor, the read call should block until something is written into the FIFO, but my program always runs without blocking and rd=0 result. Can you please explain this behaviour? Thanks!
The behavior you are seeing is expected. You've done the following:
Opened the read end of the FIFO using O_NONBLOCK, so that a writer need not be present on the FIFO. This guarantees that the open() will immediately succeed.
Disabled O_NONBLOCK before subsequent reads. You've now taken yourself back to a position that is equivalent to the standard (blocking) case where a FIFO had a reader and writer, but the writer closed the FIFO. At that point, the reader should see end-of-file, which is what you are seeing.
It's strange! I tried a code which opens the file without O_NONBLOCK and then procedes in 3 stages. The 3rd stage doesn'act correctly althoug the O_NONBLOCK flag results reset!
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
int main()
{
char buf[1024];
int rd;
const char* fn = "prova.txt";
int i = mkfifo(fn, 0666);
int fd = open(fn, O_RDONLY); // | O_NONBLOCK);
int flags = fcntl(fd, F_GETFL);
//flags &= ~O_NONBLOCK;
printf("1) Waits!\t\tflags=0%08o\n",flags);
rd= read(fd, buf, 100);
printf("%d %o\n",rd,flags);
flags |= O_NONBLOCK;
printf("2) Doesn't wait!\tflags=0%08o\n",flags);
fcntl(fd, F_SETFL, flags);
rd= read(fd, buf, 100);
printf("%d %o\n",rd,flags);
//This doen't act the flag ????
flags &= ~O_NONBLOCK;
fcntl(fd, F_SETFL, flags);
flags=fcntl(fd, F_GETFL);
printf("3) Waits!\t\tflags=0%08o\n",flags);
rd= read(fd, buf, 100);
printf("%d %o\n",rd,flags);
puts("End!");
return 0;
}
Here is the command sequence and the output:
sergio#zarathustra:~$ ./a.out &
[2] 6555
sergio#zarathustra:~$ echo xxx >> prova.txt
1) Waits! flags=000100000
4 100000
2) Doesn't wait! flags=000104000
0 104000
3) Waits! flags=000100000
0 100000
End!
sergio#zarathustra:~$
I looked at your code and at first glance it seems like it should work. There are no errors returned, you don't seem to be breaking any rules, but it's just not blocking.
So I went ahead and traced the read call to see what it was doing:
And it goes all the way to the pipe_read function without any attempt to block. Once it's there it realizes there's nobody on the other side of the pipe and returns EOF.
So this is apparently by design, but with a pipe only the open call will try to block if there's no writer, once open returns it's just assumed that there must be a writer at the other end of that pipe or that you're nonblocking and ready to handle that. And it sort of makes sense. If you're trying to read from a pipe but the writer is gone (or was never there in the first place), you don't want to keep waiting there forever.
If you want to wait until a writer opens the pipe, don't use O_NONBLOCK in the open call. If you do use O_NONBLOCK in open, then there might not be anyone at the other end of the pipe and the read calls may just return EOF without blocking.
So in short make sure there's someone at the other end of the pipe when you're reading from it.
Given an arbitrary file descriptor, can I make it blocking if it is non-blocking? If so, how?
Its been a while since I played with C, but you could use the fcntl() function to change the flags of a file descriptor:
#include <unistd.h>
#include <fcntl.h>
// Save the existing flags
saved_flags = fcntl(fd, F_GETFL);
// Set the new flags with O_NONBLOCK masked out
fcntl(fd, F_SETFL, saved_flags & ~O_NONBLOCK);
I would expect simply non-setting the O_NONBLOCK flag should revert the file descriptor to the default mode, which is blocking:
/* Makes the given file descriptor non-blocking.
* Returns 1 on success, 0 on failure.
*/
int make_blocking(int fd)
{
int flags;
flags = fcntl(fd, F_GETFL, 0);
if(flags == -1) /* Failed? */
return 0;
/* Clear the blocking flag. */
flags &= ~O_NONBLOCK;
return fcntl(fd, F_SETFL, flags) != -1;
}