C program to mkdir in POSIX shared memory missing permissions - c

I have a POSIX shared memory directory which gets blown away on reboot and needs to be recreated on occasion. An example,
#include <sys/stat.h>
#include <sys/file.h>
int main(int argc, char argv[])
{
struct stat st = {0};
if (stat("/dev/shm/example", &st) == -1) {
mkdir("/dev/shm/example", 0777);
}
}
This creates the directory with missing write permissions for group/others:
drwxr-xr-x. 2 *** *** *** May 14 12:00 example
I've tried experimenting with the mode_t flags and even replaced "0777" with "S_IRWXU | S_IRWXG | S_IRWXO". I do need the directory to have permission flag 0777.

You need to set the permission explicitly or reset your file creation mask with the umask() function.
Per the POSIX mkdir() documentation (bolding mine):
The mkdir() function shall create a new directory with name path. The file permission bits of the new directory shall be initialized from mode. These file permission bits of the mode argument shall be modified by the process' file creation mask.
The only thread-safe way to create a file or directory with the exact permissions you want is to explicitly set them after creation:
mkdir("/dev/shm/example", 0777);
chmod("/dev/shm/example", 0777);
Of course, you'd do well to actually check the return values for those calls.
Calling umask() to set and restore your file creation mask is not thread-safe:
mode_t old = umask( 0 );
mkdir("/dev/shm/example", 0777);
umask( old );
There are multiple race conditions possible with doing that:
The old mask you get with the first call to umask() can be the 0 set by another thread
Either call can overwrite the value of the file creation mask currently in use by another thread
If either of those race conditions happens, either
your file/directory won't get the permissions you need
the other thread's file/directory won't get the permissions it needs
the original file creation mask setting will be lost
Or all three.
So don't call umask() when you need to set permission bits exactly. Set the mode explicitly with a call to [f]chmod()
(The possible issues that could arise from creating directories in /dev/shm is probably worth another question...)

Related

What happens if you try to read/write a mapping with a deleted / disconnected backing file or device?

If I perform a mmap() on some file or a device in /dev/ that exposes memory, what happens to that mapping if the file is deleted or that device disconnected?
I've written a test program to experiment but I can't find any solid documentation on what should happen. The test program does the following
#include <stdio.h>
#include <fcntl.h>
#include <stdint.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/mman.h>
int main()
{
int fd = open("test_file", O_CREAT | O_RDWR);
uint32_t data[10000] = {0};
data[3] = 22;
write(fd, data, sizeof data);
sync();
struct stat s;
fstat(fd, &s);
uint32_t *map = (uint32_t *)mmap(NULL, s.st_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
close(fd);
sleep(10);
printf("%u\n", *(map+3));
*(map+9000) = 91;
printf("%u\n", *(map+9000));
return 0;
}
My page size is 4096 bytes so I made the file larger than that so the map would span multiple pages. During that 10s sleep I run the following commands:
$ rm test_file
$ sync; echo 3 > /proc/sys/vm/drop_caches
Which I believe should destroy the backing file of the map and remove all pages so that it must seek the backing file out to perform any operations on it. Then after that 10s sleep the program attempts to read from page 1 and w/r to page 3.
Surprisingly the 2 values I get printed back out are 22 and 91
Why does this happen, is this behavior guaranteed or undefined? Is it because I was using a regular file and not a device? What if the mapping was really large, would things change? Should I expect to get a SIGSEGV or a SIGBUS under some conditions?
rm just unlinks the file from a location in the filesystem (removes it from a directory). If there are other references to the file (such as a process that has it open), the file won't actually be removed -- the OS keeps a reference count of all the references to the file and only deletes it when the reference count drop to 0.
So in this case, the reference count will still be non-zero after the rm as the process has the file open. Only when the files is unmapped and closed (which happens when the process exits) will the file actually be deleted.
In the case of a device, the device file (in the filesystem) is similarly just a reference to the device driver. Removing it won't have any effect. However, if the device itself has some concept of being removed (such as removable storage), doing that will result in future access returning some error.
What happens if you try to read/write a mapping with a deleted ... file
You will still write to that file. rm only unlinks the name from the directory, the file still exists.
disconnected backing ... device?
The process will receive a SIGBUS signal.
Why does this happen, is this behavior guaranteed or undefined?
Guaranteed, files keep reference count since always.
Is it because I was using a regular file and not a device?
No. A device kind of is a regular file. You can open("/dev/sda" and write to it and rm /dev/sda. In Linux almost everything is a regular file. A file has only one more layer of indirection in kernel - a filesystem.
What if the mapping was really large, would things change?
Should I expect to get a SIGSEGV or a SIGBUS under some conditions?
See man mmap. Search for SIGBUS.

Kernel module check if file exists

I'm making some extensions to the kernel module nandsim, and I'm having trouble finding the correct way to test if a file exists before opening it. I've read this question, which covers how the basic open/read/write operations go, but I'm having trouble figuring out if and how the normal open(2) flags apply here.
I'm well aware that file reading and writing in kernel modules is bad practice; this code already exists in the kernel and is already reading and writing files. I am simply trying to make a few adjustments to what is already in place. At present, when the module is loaded and instructed to use a cache file (specified as a string path when invoking modprobe), it uses filp_open() to open the file or create it if it does not exist:
/* in nandsim.c */
...
module_param(cache_file, charp, 0400);
...
MODULE_PARM_DESC(cache_file, "File to use to cache nand pages instead of memory");
...
struct file *cfile;
cfile = filp_open(cache_file, O_CREAT | O_RDWR | O_LARGEFILE, 0600);
You might ask, "what do you really want to do here?" I want to include a header for the cache file, such that it can be reused if the system needs to be reset. By including information about the nand page geometry and page count at the beginning of this file, I can more readily simulate a number of error conditions that otherwise would be impossible within the nandsim framework. If I can bring down the nandsim module during file operations, or modify the backing file to model a real-world fault mode, I can recreate the net effect of these error conditions.
This would allow me to bring the simulated device back online using nandsim, and assess how well a fault-tolerant file system is doing its job.
My thought process was to modify it as follows, such that it would fail trying to force creation of a file which already exists:
struct file *cfile;
cfile = filp_open(cache_file, O_CREAT | O_EXCL | O_RDWR | O_LARGEFILE, 0600);
if(IS_ERR(cfile)){
printk(KERN_INFO "File didn't exist: %ld", PTR_ERR(cfile));
/* Do header setup for first-time run of NAND simulation */
}
else{
/* Read header and validate against system parameters. Recover operations */
}
What I'm seeing is an error, but it is not the one I would have expected. It is reporting errno 14, EFAULT (bad address) instead of errno 17 EEXIST (File exists). I don't want to run with this because I would like this to be as idiomatic and correct as possible.
Is there some other way I should be doing this?
Do I need to somehow specify that the file path is in user address space? If so, why is that not the case in the code as it was?
EDIT: I was able to get a reliable error by trying to open with only O_RDWR and O_LARGEFILE, which resulted in ENOENT. It is still not clear why my original approach was incorrect, nor what the best way to accomplish my goal is. That said, if someone more experienced could comment on this, I can add it as a solution.
Indeed, filp_open expects a file path which is in kernel address space. Proof is the use of getname_kernel. You can mimic this for your use case with something like this:
struct filename *name = getname(cache_file);
struct file *cfile = ERR_CAST(name);
if (!IS_ERR(name)) {
cfile = file_open_name(name, O_CREAT | O_EXCL | O_RDWR | O_LARGEFILE, 0600);
if (IS_ERR(cfile))
return PTR_ERR(cfile);
putname(name);
}
Note that getname expects a user-space address and is the equivalent of getname_kernel.

Why call shm_unlink first before calling shm_open?

I have seen the following code pattern in a legacy project:
Check whether the shared-memory has been created with name "/abc":
int fd = shm_open("/abc", O_RDWR, 0777);
if(fd != -1)
{
close(fd);
return -1;
}
Remove an object previously created by shm_open():
shm_unlink("/abc");
create a shared memory object:
fd = shm_open("/abc", (O_CREAT | O_RDWR), S_IWUSR);
Is Step 2 redundant?
The code can run into Step 2 because the shared-memory object doesn't exist for "/abc". In other words, the code returns if the object does exist. Why should we explicitly call shm_unlink to remove the non-existing object?
Can we shorten the three steps to just one?
I think we can proceed as follows, where we use the flag O_EXCL to check whether there exists an old memory object and create it if it doesn't exist at all. The shm_open() man page says:
O_EXCL
If O_CREAT was also specified, and a shared memory object
with the given name already exists, return an error. The
check for the existence of the object, and its creation if
it does not exist, are performed atomically.
So it should be okay to replace all the code above with a single line:
int fd = shm_open("/abc", O_RDWR | O_EXCL, 0777);
Is that correct?
Is the Step 2 redundant?
It is. It serves no purpose.
Besides this "check-for-existence" is prone to TOCTOU vulnerability.
Can we shorten the 3-step above as a single step
Yes. That's the right way to go about it. But you'll also need O_CREAT flag (which is missing in your code).

Location of /dev/shm on Mac OS X

I am working on a shared memory assignment on Mac OS X
#define SHARED_OBJECT_PATH "/my_shared_memory"
fd = shm_open(SHARED_OBJECT_PATH, O_CREAT | O_EXCL | O_RDWR, S_IRWXU | S_IRWXG);
if (fd < 0) {
perror("In shm_open()");
exit(1);
}
One of the small snippets in the program is the above.
When I compile and run the program a second time, I would get the error:
In shm_open(): File exists
I am assuming because I need to manually delete using rm [path_to]/my_shared_memory. I know on Linux, the default location is /dev/shm, however, this path does not exist on Mac OS X.
Where is the location of my_shared_memory so I can delete it?
The simplest solution to your problem is not using
O_EXCL
if you don't want that behaviour.
Generally, shared memory objects do have a name, but it's not really a file name -- you can't generally delete them. It's good POSIX style to display them under /dev/shm, but this depends on your OS:
My best guess would be that you should read what man shm_open says on your machine.
Under Mac OS which is derived from BSD, there are no visible file entries in the file system for shared memory objects. Cf. https://stackoverflow.com/a/73752984/14393739 for more details.
As a consequence, it is not possible to do cleanups with a rm command or unlink() call. O_EXCL flag should be used with care: at program startup, try shm_open() without O_EXCL and O_CREAT first. If the latter fails, retry with both flags.

setgid Operation not permitted

I have a C program that calls setgid() with the group id of the group "agrp", and it is saying "Operation not permitted" when I try to run it.
The program has the following ls -la listing:
-r-xr-s--x 1 root agrp 7508 Nov 18 18:48 setgidprogram
What I want, is setgidprogram to be able to access a file that has the owner otheruser and the group agrp, and permissions set to u+rw,g+rw (User and group read/writeable.)
What am I doing wrong? Does setgidprogram HAVE to have the setuid bit set also? (When I tried it, it worked.)
I am running Fedora 19, and I have SELinux disabled.
EDIT
Here is some example code:
wrap.c:
#include <stdio.h>
#include <errno.h>
#include <sys/types.h>
#include <unistd.h>
#include <grp.h>
int main(void)
{
struct group *grp = getgrnam("agrp");
printf("%d\n",grp->gr_gid);
if(setgid(grp->gr_gid) != 0)
{
printf("%s.\n", strerror(errno));
return 1;
}
execl("/tmp/whoami_script.sh", NULL);
printf("%s.\n", strerror(errno));
return 0;
}
/tmp/whoami_script.sh:
#!/usr/bin/bash
id
$ ls -la /tmp/whoami_script.sh wrap
-r-xr-xr-x 1 root agrp 19 Nov 18 19:53 /tmp/whoami_script.sh
$ ./wrap
1234
uid=1000(auser) gid=1000(auser) groups=1000(auser),0(root),10(wheel)
---x--s--x 1 root agrp 7500 Nov 18 19:55 wrap
Is this enough information now?
The original version of the question showed 6550 permission on the file.
If you're not either user root or in group agrp, you need to be able to use the public execute permissions on the program — which are missing. Since it is a binary, you don't need read permission. To fix it:
# chmod o+x setgidprogram
(The # denotes 'as root or via sudo', or equivalent mechanisms.) As it stands, only people who already have the relevant privileges can use the program.
If the program is installed SGID agrp, there is no need for the program to try to do setgid(agrp_gid) internally. The effective GID will be the GID belonging to agrp and the program will be able to access files as any other member of agrp could.
That said, normally you can do a no-op successfully. For example, this code works fine:
#include <stdio.h>
#include <unistd.h>
#include "stderr.h"
int main(int argc, char **argv)
{
err_setarg0(argv[argc-argc]);
gid_t gid = getegid();
if (setgid(gid) != 0)
err_syserr("Failed to setgid(%d)\n", (int)gid);
puts("OK");
return 0;
}
(You just have to accept that the err_*() function do error reporting; the argc-argc trick avoids a warning/error from the compiler about otherwise unused argument argc.)
If you make the program SUID root, then the SGID property doesn't matter much; the program will run with EUID root and that means it can do (almost) anything. If it is SUID root, you should probably be resetting the EUID to the real UID:
setuid(getuid());
before invoking the other program. Otherwise, you're invoking the other program as root, which is likely to be dangerous.
Dissecting POSIX
In his answer, BenjiWiebe states:
The problem was I was only setting my effective GID, not my real GID. Therefore, when I exec'd, the child process was started with the EGID set to the RGID. So, in my code, I used setregid() which worked fine.
Yuck; which system does that? Linux trying to be protective? It is not the way things worked classically on Unix, that's for sure. However, the POSIX standard seems to have wriggle room in the verbiage (for execvp()):
If the ST_NOSUID bit is set for the file system containing the new process image file, then the effective user ID, effective group ID, saved set-user-ID, and saved set-group-ID are unchanged in the new process image. Otherwise, if the set-user-ID mode bit of the new process image file is set, the effective user ID of the new process image shall be set to the user ID of the new process image file. Similarly, if the set-group-ID mode bit of the new process image file is set, the effective group ID of the new process image shall be set to the group ID of the new process image file. The real user ID, real group ID, and supplementary group IDs of the new process image shall remain the same as those of the calling process image. The effective user ID and effective group ID of the new process image shall be saved (as the saved set-user-ID and the saved set-group-ID) for use by setuid().
If I'm parsing that right, then we have a number of scenarios:
ST_NOSUID is set.
ST_NOSUID is not set, but SUID or SGID bit is set on the executable.
ST_NOSUID is not set, but SUID or SGID biy is not set on the executable.
In case 1, it is fairly clearly stated that the EUID and EGID of the exec'd process are the same as in the original process (and if the EUID and RUID are different in the original process, they will be different in the child).
In case 2, if the SUID bit is set on the executable, the EUID will be set to the SUID. Likewise if the SGID bit is set on the executable, the EGID will be set to the SGID. It is not specified what happens if the SUID bit is set, the SGID bit is not set, and the original process has different values for EGID and RGID; nor, conversely, is it specified what happens if the SGID bit is set, the SUID bit is not set, and the original process has different values for EUID and RUID.
Case 3, where neither the SUID nor SGID bit is set on the executable, also seems to be unspecified behaviour.
Classically on Unix systems, the EUID and RUID could be different, and the difference would be inherited across multiple (fork() and) exec() operations if the executable does not override the EUID or EGID with its own SUID or SGID bits. However, it is not clear that the POSIX standard mandates or prohibits this; it seems to be unspecified behaviour. The rationale section provides no guidance on the intentions.
If my reading is correct, then I find it amusing that the ST_NOSUID bit means that if a program is launched by a process that is running SUID, then the program on the 'no SUID' file system will be run with different real and effective UID (RUID and EUID), which seems counter-intuitive. It doesn't matter what the SUID and SGID bits on the executable are set to (so the bits on the executable are ignored), but the inherited values of EUID and RUID are maintained.
This code finally worked:
#include <stdio.h>
#include <errno.h>
#include <sys/types.h>
#include <unistd.h>
#include <grp.h>
int main(void)
{
gid_t g = getegid();
if(setregid(g, g) != 0)
{
printf("Error setting GID: %s.\n", strerror(errno));
}
execl("/tmp/whoami_script.sh", "/tmp/whoami_script.sh", NULL);
printf("Error: %s.\n", strerror(errno));
return 0;
}
The problem was I was only setting my effective GID, not my real GID. Therefore, when I exec'd, the child process was started with the EGID set to the RGID. So, in my code, I used setregid which worked fine.

Resources