nftw wants a parameter for number of file handles to use, and doesn't seem to have a way to say 'as many as possible'. Specifying 255 seems to work on Linux, but fails on BSD. Apparently OPEN_MAX is the recommended solution on BSD, but I can't use this as it doesn't work on Linux.
Is there a portable equivalent of OPEN_MAX that will work on both Linux and BSD?
Alternatively, is there a portable number, some number large enough to not slow things down, that is portable for practical purposes (ideally specified in POSIX, or at least that will work on every Unix-like system with significant market share)?
Advanced Programming in the Unix Environment, 2nd Ed gives us the following code which should work everywhere; though it is pretty clever, I think it is a little unfortunate it doesn't also check the rlimits of the process, since the rlimits can further constrain how many open files a process may use. That aside, here's the code from The Master:
#ifdef OPEN_MAX
static long openmax = OPEN_MAX;
#else
static long openmax = 0;
#endif
/*
* If OPEN_MAX is indeterminate, we're not
* guaranteed that this is adequate.
*/
#define OPEN_MAX_GUESS 256
long
open_max(void)
{
if (openmax == 0) { /* first time through */
errno = 0;
if ((openmax = sysconf(_SC_OPEN_MAX)) < 0) {
if (errno == 0)
openmax = OPEN_MAX_GUESS; /* it's indeterminate */
else
err_sys("sysconf error for _SC_OPEN_MAX");
}
}
return(openmax);
}
(err_sys() is provided in the apue.h header with the sources -- should be easy to code a replacement for your routine.)
See getdtablesize. It has a conformance note:
SVr4, 4.4BSD (the getdtablesize() function first appeared in 4.2BSD). It is not specified in POSIX.1-2001; portable applications should employ sysconf(_SC_OPEN_MAX) instead of this call.
Related
What gets me really confused is some programmers refer to the filename as what I refer to the path, eg. /Users/example/Desktop/file.txt. What I call the file name would just be file.txt, but apparently some people refer to the path /Users/example/Desktop/file.txt as the filename. This gets me really confused. That being said, are macros FILENAME_MAX and PATH_MAX the same?
FILENAME_MAX is defined in stdio.h or something so it's cross platform, but am I giving my self extra work doing this?
#ifdef __linux__
# include <linux/limits.h>
#elif defined(_WIN32)
# include <windows.h>
# define PATH_MAX MAX_PATH
#else
# include <sys/syslimits.h>
#endif
When just #include <stdio.h> and using FILENAME_MAX is enough? To create path buffers large enough to hold any file path?
tl;dr Don't trust either of them.
A "path" is an absolute path like /home/you/foo.txt or a relative path like you/foo.txt or even foo.txt.
A "filename" is poorly defined. It could be just the "basename" foo.txt or it could be a synonym for "path". The C standard seems to use "filename" to mean "path". For example, fopen takes a filename which is really a path.
FILENAME_MAX is standard C, PATH_MAX is POSIX, MAX_PATH is Windows.
FILENAME_MAX is a C standard constant...
which expands to an integer constant expression that is the size needed for an array of char large enough to hold the longest file name string that the implementation guarantees can be opened.
PATH_MAX is not part of the C standard. It is a POSIX constant...
Maximum number of bytes in a pathname, including the terminating null character.
If the path is too long, POSIX functions should give an ENAMETOOLONG error, but not all compilers enforce this.
MAX_PATH is a Windows API constant...
In the Windows API (with some exceptions discussed in the following paragraphs), the maximum length for a path is MAX_PATH, which is defined as 260 characters. A local path is structured in the following order: drive letter, colon, backslash, name components separated by backslashes, and a terminating null character. For example, the maximum path on drive D is "D:\some 256-character path string"
For example, C standard fopen uses FILENAME_MAX. POSIX open uses PATH_MAX.
But probably don't trust them.
FILENAME_MAX is not safe to use to allocate memory because it might be INT_MAX. The GCC documentation warns...
Unlike PATH_MAX, this macro is defined even if there is no actual limit imposed. In such a case, its value is typically a very large number. This is always the case on GNU/Hurd systems.
Usage Note: Don’t use FILENAME_MAX as the size of an array in which to
store a file name! You can’t possibly make an array that big! Use
dynamic allocation (see Memory Allocation) instead.
PATH_MAX may not be defined.
Because they are hard coded by the operating system, neither can be trusted to be a true representation of what the file system can handle. For example, my Mac defines both PATH_MAX and FILENAME_MAX as 1024. A FAT32 filesystem has a limit of 255 characters. If I mount a FAT32 filesystem on my Mac, PATH_MAX and FILENAME_MAX will be incorrect.
Evan Klitzke describes the problems with PATH_MAX and by extension FILENAME_MAX.
If a function requires you to allocate a buffer to store a path, there is often an alternative which will allocate memory for you, or which will take a max size. If you're left with no choice, 1024 or 4096 are good choices.
It is often (quite arbitrarily) set so PATH_MAX equals 4096 and NAME_MAX (FILENAME_MAX?) would be 255, as mentioned here.
This should be considered legacy in practice due to possible some file names could end up being longer than PATH_MAX / FILENAME_MAX due to different ways path name lengths, path name concatenation and path name expansion could be performed (not to mention differences in string encodings).
In addition, sometimes the defined PATH_MAX value has little or nothing to do with actual limits (if any) imposed by the OS.
For a long time (maybe still), Windows systems defined PATH_MAX as 260, while the OS supported path names up to 32Kb or more (there are some details here).
These aren't comparable at all. The *nix constant PATH_MAX is the largest path you can pass to a system call on *nix operating systems but longer paths can exist because system calls take relative paths, while the Windows constant MAX_PATH was the largest possible absolute path that could occur on Windows. In fact MAX_PATH has been revoked by the ABI but you need to opt-in to using longer paths.
PATH_MAX is normally a fixed constant (for posix systems) that specify the maximum length of the path parameter you pass to the kernel in a system call (e.g. open(2) syscall)
But the maximum path length a file can have, is not affected by this constant, as you can test with this simple shell scritp:
$ i=0
> while [ "$i" -lt 2000 ]
> do mkdir "$i"
> cd "$i"
> i=`expr "$i" + 1`
> done
$ pwd
/home/lcu/0/1/2/3/4/5/6/7/8/9/[intermediate output not shown...]/1996/1997/1998/1999
Now, if you try:
$ cd `pwd`
-bash: cd: /home/lcu/0/1/2/3/4/5/6/7/8/9/[intermediate output not shown...]/1996/1997/1998/1999: File name too long
Showing you that the problem is when bash tries to do the chdir(2) system call. (there's a little routine below that shows you how to open a file whose name is longer than the PATH_MAX constant)
You will end in a directory that has many more characters in it's name than the PATH_MAX constant.
That doesn't mean you cannot access that file from the root directory, but you have to do it
changing your curren directory (as the script does)
using the openat(2) and friends to use a closer opend directory as start point to access it.
The absolute number of path elements a file can have to the root node is limited only by the number of inodes in the filesystem, and the total capacity of it.
Edit
When just #include <stdio.h> and using FILENAME_MAX is enough? To create path buffers large enough to hold any file path?
The only way to support any file length, for opening a file in a POSIX operating system and be portable at the same time, is to divide your path in short enough chunks of path, and do n-1 chdir(2) to the actual place you have your files on. Then call your open(2) system call with the last chunk, and then return to the directory you where, if that's possible, with the fchdir(2) system call (if your system has it). In case you have the openat(2) system call, you can just open the directories (with the opendir() call) closing the previous (as you only use it to open the next dir) and to get close enough to the final directory to be able to open it with an openat(2) system call.
Below is a (work in progress) myfopen() call that tries to open a file longer thatn the PATH_MAX limit, from <limits.h>:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "myfopen.h"
FILE *my_fopen(const char *filename, const char *mode)
{
if (strlen(filename) > PATH_MAX) {
char *work_name = strdup(filename);
if (!work_name) {
return NULL; /* cannot malloc */
}
char *to_free = work_name; /* to free at end */
int dir_fd = -1;
while(strlen(work_name) >= PATH_MAX) {
char *p = work_name + PATH_MAX - 1;
while(*p != '/') p--;
/* p points to the previous '/' to the PATH_MAX limit */
*p++ = '\0';
if (dir_fd < 0) {
dir_fd = open(work_name, 0);
if (dir_fd < 0) {
ERR("open: %s: %s\n", work_name,
strerror(errno));
free(to_free);
return NULL;
}
} else {
int aux_fd = openat(dir_fd, work_name, 0);
close(dir_fd);
if (aux_fd < 0) {
ERR("openat: %s: %s\n", work_name,
strerror(errno));
free(to_free);
return NULL;
}
dir_fd = aux_fd;
}
work_name = p;
}
/* strlen(work_name) < PATH_MAX,
* work_name points to the last chunk and
* dir_fd is the directory to base the real fopen
*/
int fd = openat(
dir_fd,
work_name,
O_RDONLY); /* this needs to be
* adjusted, based on what is
* specified in string mode */
close(dir_fd);
if (fd < 0) {
fprintf(stderr, "openat: %s: %s\n",
work_name, strerror(errno));
free(to_free);
return NULL;
}
free(to_free);
return fdopen(fd, mode);
}
return fopen(filename, mode);
}
(you will need to work the appropiate set of bit masks to pass to the final openat() call, in order to comply with the different ways of specifying the open mode of the fopen() vs. open() calls)
The basic problem with the maximum file name length (which is, why the kernel designers don't support an unbounded buffer to hold the full name of the file) is a security based one. If you allow a user to pass a 20Gb long filename into kernel space, probably you'll run out kernel memory space and that cannot be permitted (there should be a strong weakness in the system, as a malicious user could block the whole kernel, just passing a very long filename)
In the normal case, I have never had to deal with files longer than 1024 bytes, except for demostrations of this specific problem, so IMHO you should accept that limit, and procure to use short filenames (shorter than 1024 is a good limit)
On other side, you are mentioning many constants here:
FILENAME_MAX is used by stdio system, but its value is to be used only with stdio routines. Its documentation states that:
This macro constant expands to an integral expression corresponding to the size needed for an array of char elements to hold the longest file name string allowed by the library. Or, if the library imposes no such restriction, it is set to the recommended size for character arrays intended to hold a file name.
This means that it's a secure length you can tie to in order to be able to call fopen(3) and pass it a working filename.
PATH_MAX is a secure value that you can use on your system probably. If you try to open files (with the open(2) system call) or to erase (unlink(2)), rename, etc. Probably you'll run in trouble if you try to do so with filenames longer than this.
The limitation on the file path length comes from a limitation imposed by the system call subsystem, and not from the filesystem involved on the operation, so normally the limitation will be for all files in the system, and not for a specific mounted filesystems (which can be further limited, or not)
In my opinion, the limits are well thought, and using the values published will be ok for the majority of cases. And no tool in the operating system allows you to open a file with a path longer than PATH_MAX.
Consider this code (error checking removed for brevity):
int main()
{
int fd, nread;
struct stat st_buff;
/* Get information about the file */
stat("data",&st_buff);
/* Open file data for reading */
char strbuff[st_buff.st_blksize];
fd = open("data",O_RDONLY);
/* read and write data */
do {
nread = read(fd,strbuff,st_buff.st_blksize);
if (!nread)
break;
write(STDOUT_FILENO, strbuff, nread);
} while (nread == st_buff.st_blksize);
/* close the file */
close(fd);
return 0;
}
This code allocates memory on stack for buffer (if I am not misunderstanding something.) There is also alloca() function which I could have used for same purpose (I guess). I was wondering if there were any reason why I would want to choose one over other?
You'd generally want to use a VLA as you have above, because it's clean and standard, whereas alloca is ugly and not in the standard (well, not in the C standard, anyway -- it probably is in POSIX).
I'm pretty sure both are the same at machine-code level. Both take the memory from the stack. This has the following implications:
The esp is moved by an appropriate value.
The taken stack memory is probe'd.
The function will have to have a proper stack frame (i.e. it should use ebp to access other locals).
Both methods do this. Both don't have a "conventional" error handling (raise a SEH exception in Windows and whatever on Linux).
There is a reason to choose one over another if you mind about portability. VLAs are not standard IMHO. alloca seems somewhat more standard.
P.S. consider using malloca ?
Does anyone know to get the current thread ID as an integer on BSD?
i found this
#ifdef RTHREADS
299 STD { pid_t sys_getthrid(void); }
300 STD { int sys_thrsleep(void *ident, int timeout, void *lock); }
301 STD { int sys_thrwakeup(void *ident, int n); }
302 STD { int sys_threxit(int rval); }
303 STD { int sys_thrsigdivert(sigset_t sigmask); }
#else
299 UNIMPL
300 UNIMPL
301 UNIMPL
302 UNIMPL
303 UNIMPL
#endif
and tried (long)syscall(229) but does not work (it crashes). On Linux i can get thread ID with system call (long) syscall(224) which gives me an integer (usually 4 digits). Anyone can help?! Thank you.
There is no such thing as "BSD". Every *BSD system is completely different, especially when it comes to threads. Even within single project like FreeBSD there are various pthread implementations (libc_r, kse, thr) that vary between os versions and user configuration.
Having said that, on FreeBSD-8 there should be int thr_self(long *id) in /usr/include/sys/thr.h and on reasonably fresh NetBSD there is lwpid_t _lwp_self(void) in /usr/include/lwp.h.
For more platforms you can take a look at int get_unix_tid(void) in wine source.
Find out which <sys/types.h> could be included in your C translation units (by checking along oyur includes path(s)). pid_t is defined there. It's a signed integral type, but there are a few of those. It could easily be wider than a long.
The Open Groups documentation of sys/types.h promises "The implementation shall support one or more programming environments in which the widths of blksize_t, pid_t, size_t, ssize_t, suseconds_t, and useconds_t are no greater than the width of type long. The names of these programming environments can be obtained using the confstr() function or the getconf utility." So you can probably cast a pid_t to long (or at least use getconf to find out what you have to do to be in a situation where pid_t can be safely cast to long).
See C Language Gotchas: printf format strings for a discussion of why what you want to do is complicated, cannot be written portably, and may suddenly break in the future.
The default limit for the max open files on Mac OS X is 256 (ulimit -n) and my application needs about 400 file handlers.
I tried to change the limit with setrlimit() but even if the function executes correctly, i'm still limited to 256.
Here is the test program I use:
#include <stdio.h>
#include <sys/resource.h>
main()
{
struct rlimit rlp;
FILE *fp[10000];
int i;
getrlimit(RLIMIT_NOFILE, &rlp);
printf("before %d %d\n", rlp.rlim_cur, rlp.rlim_max);
rlp.rlim_cur = 10000;
setrlimit(RLIMIT_NOFILE, &rlp);
getrlimit(RLIMIT_NOFILE, &rlp);
printf("after %d %d\n", rlp.rlim_cur, rlp.rlim_max);
for(i=0;i<10000;i++) {
fp[i] = fopen("a.out", "r");
if(fp[i]==0) { printf("failed after %d\n", i); break; }
}
}
and the output is:
before 256 -1
after 10000 -1
failed after 253
I cannot ask the people who use my application to poke inside a /etc file or something. I need the application to do it by itself.
rlp.rlim_cur = 10000;
Two things.
1st. LOL. Apparently you have found a bug in the Mac OS X' stdio. If I fix your program up/add error handling/etc and also replace fopen() with open() syscall, I can easily reach the limit of 10000 (which is 240 fds below my 10.6.3' OPEN_MAX limit 10240)
2nd. RTFM: man setrlimit. Case of max open files has to be treated specifically regarding OPEN_MAX.
etresoft found the answer on the apple discussion board:
The whole problem here is your
printf() function. When you call
printf(), you are initializing
internal data structures to a certain
size. Then, you call setrlimit() to
try to adjust those sizes. That
function fails because you have
already been using those internal
structures with your printf(). If you
use two rlimit structures (one for
before and one for after), and don't
print them until after calling
setrlimit, you will find that you can
change the limits of the current
process even in a command line
program. The maximum value is 10240.
For some reason (perhaps binary compatibility), you have to define _DARWIN_UNLIMITED_STREAMS before including <stdio.h>:
#define _DARWIN_UNLIMITED_STREAMS
#include <stdio.h>
#include <sys/resource.h>
main()
{
struct rlimit rlp;
FILE *fp[10000];
int i;
getrlimit(RLIMIT_NOFILE, &rlp);
printf("before %d %d\n", rlp.rlim_cur, rlp.rlim_max);
rlp.rlim_cur = 10000;
setrlimit(RLIMIT_NOFILE, &rlp);
getrlimit(RLIMIT_NOFILE, &rlp);
printf("after %d %d\n", rlp.rlim_cur, rlp.rlim_max);
for(i=0;i<10000;i++) {
fp[i] = fopen("a.out", "r");
if(fp[i]==0) { printf("failed after %d\n", i); break; }
}
}
prints
before 256 -1
after 10000 -1
failed after 9997
This feature appears to have been introduced in Mac OS X 10.6.
This may be a hard limitation of your libc. Some versions of solaris have a similar limitation because they store the fd as an unsigned char in the FILE struct. If this is the case for your libc as well, you may not be able to do what you want.
As far as I know, things like setrlimit only effect how many file you can open with open (fopen is almost certainly implemented in terms on open). So if this limitation is on the libc level, you will need an alternate solution.
Of course you could always not use fopen and instead use the open system call available on just about every variant of unix.
The downside is that you have to use write and read instead of fwrite and fread, which don't do things like buffering (that's all done in your libc, not by the OS itself). So it could end up be a performance bottleneck.
Can you describe the scenario that requires 400 files open ** simultaneously**? I am not saying that there is no case where that is needed. But, if you describe your use case more clearly, then perhaps we can recommend a better solution.
I know that's sound a silly question, but you really need 400 files opened at the same time?
By the way, are you running this code as root are you?
Mac OS doesn't allow us to easily change the limit as in many of the unix based operating system. We have to create two files
/Library/LaunchDaemons/limit.maxfiles.plist
/Library/LaunchDaemons/limit.maxproc.plist
describing the max proc and max file limit. The ownership of the file need to be changed to 'root:wheel'
This alone doesn't solve the problem, by default latest version of mac OSX uses 'csrutil', we need to disable it. To disable it we need to reboot our mac in recovery mode and from there disable csrutil using terminal.
Now we can easily change the max open file handle limit easily from terminal itself (even in normal boot mode).
This method is explained in detail in the following link. http://blog.dekstroza.io/ulimit-shenanigans-on-osx-el-capitan/
works for OSX-elcapitan and OSX-Seirra.
I want to use /dev/random or /dev/urandom in C. How can I do it? I don't know how can I handle them in C, if someone knows please tell me how. Thank you.
In general, it's a better idea to avoid opening files to get random data, because of how many points of failure there are in the procedure.
On recent Linux distributions, the getrandom system call can be used to get crypto-secure random numbers, and it cannot fail if GRND_RANDOM is not specified as a flag and the read amount is at most 256 bytes.
As of October 2017, OpenBSD, Darwin and Linux (with -lbsd) now all have an implementation of arc4random that is crypto-secure and that cannot fail. That makes it a very attractive option:
char myRandomData[50];
arc4random_buf(myRandomData, sizeof myRandomData); // done!
Otherwise, you can use the random devices as if they were files. You read from them and you get random data. I'm using open/read here, but fopen/fread would work just as well.
int randomData = open("/dev/urandom", O_RDONLY);
if (randomData < 0)
{
// something went wrong
}
else
{
char myRandomData[50];
ssize_t result = read(randomData, myRandomData, sizeof myRandomData);
if (result < 0)
{
// something went wrong
}
}
You may read many more random bytes before closing the file descriptor. /dev/urandom never blocks and always fills in as many bytes as you've requested, unless the system call is interrupted by a signal. It is considered cryptographically secure and should be your go-to random device.
/dev/random is more finicky. On most platforms, it can return fewer bytes than you've asked for and it can block if not enough bytes are available. This makes the error handling story more complex:
int randomData = open("/dev/random", O_RDONLY);
if (randomData < 0)
{
// something went wrong
}
else
{
char myRandomData[50];
size_t randomDataLen = 0;
while (randomDataLen < sizeof myRandomData)
{
ssize_t result = read(randomData, myRandomData + randomDataLen, (sizeof myRandomData) - randomDataLen);
if (result < 0)
{
// something went wrong
}
randomDataLen += result;
}
close(randomData);
}
There are other accurate answers above. I needed to use a FILE* stream, though. Here's what I did...
int byte_count = 64;
char data[64];
FILE *fp;
fp = fopen("/dev/urandom", "r");
fread(&data, 1, byte_count, fp);
fclose(fp);
Just open the file for reading and then read data. In C++11 you may wish to use std::random_device which provides cross-platform access to such devices.
Zneak is 100% correct. Its also very common to read a buffer of random numbers that is slightly larger than what you'll need on startup. You can then populate an array in memory, or write them to your own file for later re-use.
A typical implementation of the above:
typedef struct prandom {
struct prandom *prev;
int64_t number;
struct prandom *next;
} prandom_t;
This becomes more or less like a tape that just advances which can be magically replenished by another thread as needed. There are a lot of services that provide large file dumps of nothing but random numbers that are generated with much stronger generators such as:
Radioactive decay
Optical behavior (photons hitting a semi transparent mirror)
Atmospheric noise (not as strong as the above)
Farms of intoxicated monkeys typing on keyboards and moving mice (kidding)
Don't use 'pre-packaged' entropy for cryptographic seeds, in case that doesn't go without saying. Those sets are fine for simulations, not fine at all for generating keys and such.
Not being concerned with quality, if you need a lot of numbers for something like a monte carlo simulation, it's much better to have them available in a way that will not cause read() to block.
However, remember, the randomness of a number is as deterministic as the complexity involved in generating it. /dev/random and /dev/urandom are convenient, but not as strong as using a HRNG (or downloading a large dump from a HRNG). Also worth noting that /dev/random refills via entropy, so it can block for quite a while depending on circumstances.
zneak's answer covers it simply, however the reality is more complicated than that. For example, you need to consider whether /dev/{u}random really is the random number device in the first place. Such a scenario may occur if your machine has been compromised and the devices replaced with symlinks to /dev/zero or a sparse file. If this happens, the random stream is now completely predictable.
The simplest way (at least on Linux and FreeBSD) is to perform an ioctl call on the device that will only succeed if the device is a random generator:
int data;
int result = ioctl(fd, RNDGETENTCNT, &data);
// Upon success data now contains amount of entropy available in bits
If this is performed before the first read of the random device, then there's a fair bet that you've got the random device. So #zneak's answer can better be extended to be:
int randomData = open("/dev/random", O_RDONLY);
int entropy;
int result = ioctl(randomData, RNDGETENTCNT, &entropy);
if (!result) {
// Error - /dev/random isn't actually a random device
return;
}
if (entropy < sizeof(int) * 8) {
// Error - there's not enough bits of entropy in the random device to fill the buffer
return;
}
int myRandomInteger;
size_t randomDataLen = 0;
while (randomDataLen < sizeof myRandomInteger)
{
ssize_t result = read(randomData, ((char*)&myRandomInteger) + randomDataLen, (sizeof myRandomInteger) - randomDataLen);
if (result < 0)
{
// error, unable to read /dev/random
}
randomDataLen += result;
}
close(randomData);
The Insane Coding blog covered this, and other pitfalls not so long ago; I strongly recommend reading the entire article. I have to give credit to their where this solution was pulled from.
Edited to add (2014-07-25)...
Co-incidentally, I read last night that as part of the LibReSSL effort, Linux appears to be getting a GetRandom() syscall. As at time of writing, there's no word of when it will be available in a kernel general release. However this would be the preferred interface to get cryptographically secure random data as it removes all pitfalls that access via files provides. See also the LibReSSL possible implementation.