Meaning of field d_off in last struct dirent - c

man getdensts says that in d_off an offset to next struct dirent is kept. But what should be kept in this field for last dirent? I was unable to find this SVr4 standard to look there, but man says nothing

"SVr4" means Unix System V Release 4. Solaris is based on that, and Solaris says:
The d_off entry contains a value which is interpretable only by the filesystem that generated it. It may be supplied as an offset to lseek(2) to find the entry following the current one in a directory.
If you look at the example in the Linux manpage, you'll find a program that uses getdents. It doesn't rely on the d_off of the final entry, which is apparently indeterminate, but on the return value from getdents, to determine how many entries there are.
Btw., the Linux manpage also states quite clearly that you shouldn't be using the getdents system call and that it isn't even supported by GLibc. Use the POSIX readdir interface instead.

I'd say it's 0, which could not indicate any next entry as the value implicitly needs to be larger 0 and also wouldn't be wrong as it points to a valid entry, namely the first one.

Related

How to know Darwin kernel scheduler time slice?

On Linux, sched.h contains the definition of
int sched_rr_get_interval(pid_t pid, struct timespec * tp);
to get the time slice of a process. However the file shipping with OS X El Capitan doesn't hold that definition.
Is there an alternative for this on OS X?
The API's related to this stuff are pretty byzantine and poorly documented, but here's what I've found.
First, the datatypes related to RR scheduling seem to be in /usr/include/mach/policy.h, around line 155. There's this struct:
struct policy_rr_info {
...
integer_t quantum;
....
};
The quantum is, I think, the timeslice (not sure of units.) Then grepping around for this or related types defined in the same place, I found the file /usr/include/mach/mach_types.def, which says that the type struct thread_policy_t contains a field policy_rr_info_t on line 203.
Next, I found in /usr/include/mach/thread_act.h the public function thread_policy_get, which can retrieve information about a thread's policy into a struct thread_policy_t *.
So, working backwards. I think (but haven't tried at all) that you can
Use the thread_policy_get() routine to return information about the thread's scheduling state into a thread_policy_t
That struct seems to have a policy_rr_info_t sub-substructure
That sub-structure should have a quantum field.
That field appears to be the timeslice, but I don't know about the units.
There are no man pages for this part of the API, but this Apple Developer page explains at least a little bit about how to use this API.
Note that this is all gleaned from just grepping the various kernel headers, and I've definitely not tried to use any of these APIs in any actual code.

How can my C code find the symbol corresponding to an address at run-time (in Linux)?

Given a function or variable run-time address, my code needs to find out the name and, if it's a variable, type information of the symbol. Or at least provide enough information for later, off-line extraction of the name (and type info).
It is Linux code and it is assumed debug information is available.
I tried to look into the ELF file format, binutils and all but the subject is huge, so I was hoping somebody can help me narrow the scope.
I can see the following types of solutions:
find the range of the code/data segments of the modules currently loaded in memory - HOW TO DO THAT ?. Save the address's module and segment name and offset in it's segment. Off-line then use binutils to find the symbol in the module's debug info - again, HOW TO DO THAT?
use some API/system services I do not know of to find the symbol and info at run-time - HOW?
Thank you in advance.
GNU libc provides a dladdr function for this exact purpose. However, it only works on functions, not variables.
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <dlfcn.h>
int dladdr(void *addr, Dl_info *info);
The function dladdr() takes a function pointer and tries to resolve
name and file where it is located. Information is stored in the
Dl_info structure:
typedef struct {
const char *dli_fname; /* Pathname of shared object that
contains address */
void *dli_fbase; /* Address at which shared object
is loaded */
const char *dli_sname; /* Name of symbol whose definition
overlaps addr */
void *dli_saddr; /* Exact address of symbol named
in dli_sname */
} Dl_info;
If no symbol matching addr could be found, then dli_sname and dli_saddr
are set to NULL.
dladdr() returns 0 on error, and nonzero on success.
Of course, usually I do this sort of thing from gdb, not within the program itself.
What you want to look at is the Binary File Descriptor library specifically the symbol handling functions. libbfd provides a common set of functions for manipulating and reading various object formats. It does this by providing an abstract view of object files and then has specific back ends to handle the details of specific object types and architectures. ELF file formats are supported as is most likely the architecture you want to use.
I don't find libbfd difficult to use but I am always open to alternatives and libelf is another one. You will probably want to look at the gelf_getsym function specifically.
C is a fully-compiled language. The names and types and other info about variables are generally discarded in the compilation process.
An exception is that most compilers will produce an executable with debugging information included, so that a live debugger has access to this information. This info is totally OS-specific, and even compiler-specific, and might even be in parts of memory not accessible to the program.

POSIX statvfs required behaviour

POSIX statvfs() description says:
The following flags can be returned in the f_flag member:
ST_RDONLY - Read-only file system.
ST_NOSUID - Setuid/setgid bits ignored by exec.
It is unspecified whether all members of the statvfs structure have meaningful values on all file systems.
Also sys/statvfs.h description:
The <sys/statvfs.h> header shall define the following symbolic constants for the f_flag member:
ST_RDONLY - Read-only file system.
ST_NOSUID - Does not support the semantics of the ST_ISUID and ST_ISGID file mode bits.
How to interpret this correctly? I mean:
does it allow POSIX compliant system to return nonsense, where ST_RDONLY is meaningfull?
if statvfs structure member is meaningful for particular filesystem, is OS allowed to return nonsense (I understand some fields may have no meaning for synthetic filesystems like /proc)?
Is there any OS known to return incorrect ST_RDONLY or ST_NOSUID for filesystems used to store data/executables, while claiming POSIX compatibility of it's statvfs() implementation?
The POSIX spec requires very little of statvfs(), aside from its existence.
In particular, it requires that statvfs() fill the specified struct statvfs * buffer with "information about the file system," but does not guarantee the meaning of that information. In other words it could be complete garbage and in fact is on many systems (including HFS+ on OS X).
That includes the f_flag member of struct statvfs, which can be masked to ST_RDONLY and/or ST_NOSUID but many not be on all filesystems (even when it should be).
If you need to reliably obtain filesystem information across multiple platforms, you may (ironically) have to resort to an unstandardized function like statfs(). On Linux, however, statvfs() behaves pretty well on most non-synthetic filesystems.

How to get file size in ANSI C without fseek and ftell?

While looking for ways to find the size of a file given a FILE*, I came across this article advising against it. Instead, it seems to encourage using file descriptors and fstat.
However I was under the impression that fstat, open and file descriptors in general are not as portable (After a bit of searching, I've found something to this effect).
Is there a way to get the size of a file in ANSI C while keeping in line with the warnings in the article?
In standard C, the fseek/ftell dance is pretty much the only game in town. Anything else you'd do depends at least in some way on the specific environment your program runs in. Unfortunately said dance also has its problems as described in the articles you've linked.
I guess you could always read everything out of the file until EOF and keep track along the way - with fread() for example.
The article claims fseek(stream, 0, SEEK_END) is undefined behaviour by citing an out-of-context footnote.
The footnote appears in text dealing with wide-oriented streams, which are streams that the first operation that is performed on them is an operation on wide-characters.
This undefined behaviour stems from the combination of two paragraphs. First §7.19.2/5 says that:
— Binary wide-oriented streams have the file-positioning restrictions ascribed to both text and binary streams.
And the restrictions for file-positioning with text streams (§7.19.9.2/4) are:
For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET.
This makes fseek(stream, 0, SEEK_END) undefined behaviour for wide-oriented streams. There is no such rule like §7.19.2/5 for byte-oriented streams.
Furthermore, when the standard says:
A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.
It doesn't mean it's undefined behaviour to do so. But if the stream supports it, it's ok.
Apparently this exists to allow binary files can have coarse size granularity, i.e. for the size to be a number of disk sectors rather than a number of bytes, and as such allows for an unspecified number of zeros to magically appear at the end of binary files. SEEK_END cannot be meaningfully supported in this case. Other examples include pipes or infinite files like /dev/zero. However, the C standard provides no way to distinguish between such cases, so you're stuck with system-dependent calls if you want to consider that.
Use fstat - requires the file descriptor - can get that from fileno from the FILE* - Hence the size is in your grasp along with other details.
i.e.
fstat(fileno(filePointer), &buf);
Where filePointer is the FILE *
and
buf is
struct stat {
dev_t st_dev; /* ID of device containing file */
ino_t st_ino; /* inode number */
mode_t st_mode; /* protection */
nlink_t st_nlink; /* number of hard links */
uid_t st_uid; /* user ID of owner */
gid_t st_gid; /* group ID of owner */
dev_t st_rdev; /* device ID (if special file) */
off_t st_size; /* total size, in bytes */
blksize_t st_blksize; /* blocksize for file system I/O */
blkcnt_t st_blocks; /* number of 512B blocks allocated */
time_t st_atime; /* time of last access */
time_t st_mtime; /* time of last modification */
time_t st_ctime; /* time of last status change */
};
The executive summary is that you must use fseek/ftell because there is no alternative (even the implementation specific ones) that is better.
The underlying issue is that the "size" of a file in bytes is not always the same as the length of the data in the file and that, in some circumstances, the length of the data is not available.
A POSIX example is what happens when you write data to a device; the operating system only knows the size of the device. Once the data has been written and the (FILE*) closed there is no record of the length of the data written. If the device is opened for read the fseek/ftell approach will either fail or give you the size of the whole device.
When the ANSI-C committee was sitting at the end of the 1980's a number of operating systems the members remembered simply did not store the length of the data in a file; rather they stored the disk blocks of the file and assumed that something in the data terminated it. The 'text' stream represents this. Opening a 'binary' stream on those files shows not only the magic terminator byte, but also any bytes beyond it that were never written but happen to be in the same disk block.
Consequently the C-90 standard was written so that it is valid to use the fseek trick; the result is a conformant program, but the result may not be what you expect. The behavior of that program is not 'undefined' in the C-90 definition and it is not 'implementation-defined' (because on UN*X it varies with the file). Neither is it 'invalid'. Rather you get a number you can't completely rely on or, maybe, depending on the parameters to fseek, -1 and an errno.
In practice if the trick succeeds you get a number that includes at least all the data, and this is probably what you want, and if the trick fails it is almost certainly someone else's fault.
John Bowler
different OS's provide different apis for this. For example in windows we have:
GetFileAttributes()
In MAC we have:
[[[NSFileManager defaultManager] attributesOfItemAtPath:someFilePath error:nil] fileSize];
But raw method is only by fread and fseek only:
How can I get a file's size in C?
You can't always avoid writing platform-specific code, especially when you have to deal with things that are a function of the platform. File sizes are a function of the file system, so as a rule I'd use the native filesystem API to get that information over the fseek/ftell dance. I'd create my own generic wrapper around it, so as to not pollute application logic with platform-specific details and make the code easier to port.
The article has a little problem of logic.
It (correctly) identifies that a certain usage of C functions has behavior which is not defined by ISO C. But then, to avoid this undefined behavior, the article proposes a solution: replace that usage with platform-specific functions. Unfortunately, the use of platform-specific functions is also undefined according to ISO C. Therefore, the advice does not solve the problem of undefined behavior.
The quote in my copy of the 1999 standard confirms that the alleged behavior is indeed undefined:
A binary stream need no meaningfully support fseek calls with a whence value of SEEK_END. [ISO 9899:1999 7.19.9.2 paragraph 3]
But undefined behavior does not mean "bad behavior"; it is simply behavior for which the ISO C standard gives no definition. Not all undefined behaviors are the same.
Some undefined behaviors are areas in the language where meaningful extensions can be provided. The platform fills the gap by defining a behavior.
Providing a working fseek which can seek from SEEK_END is an example of an extension in place of undefined behavior. It is possible to confirm whether or not a given platform supports fseek from SEEK_END, and if this is provisioned, then it is fine to use it.
Providing a separate function like lseek is also an extension in place of undefined behavior (the undefined behavior of calling a function which is not in ISO C and not defined in the C program). It is fine to use that, if available.
Note that those platforms which have functions like the POSIX lseek will also likely have an ISO C fseek which works from SEEK_END. Also note that on platforms where fseek on a binary file cannot seek from SEEK_END, the likely reason is that this is impossible to do (no API can be provided to do it and that is why the C library function fseek is not able to support it).
So, if fseek does provide the desired behavior on the given platform, then nothing has to be done to the program; it is a waste of effort to change it to use that platform's special function. On the other hand, if fseek does not provide the behavior, then likely nothing does, anyway.
Note that even including a nonstandard header which is not in the program is undefined behavior. (By omission of the definition of behavior.) For instance if the following appears in a C program:
#include <unistd.h>
the behavior is not defined after that. [See References below.] The behavior of the preprocessing directive #include is defined, of course. But this creates two possibilities: either the header <unistd.h> does not exist, in which case a diagnostic is required. Or the header does exist. But in that case, the contents are not known (as far as ISO C is concerned; no such header is documented for the Library). In this case, the include directive brings in an unknown chunk of code, incorporating it into the translation unit. It is impossible to define the behavior of an unknown chunk of code.
#include <platform-specific-header.h> is one of the escape hatches in the language for doing anything whatsoever on a given platform.
In point form:
Undefined behavior is not inherently "bad" and not inherently a security flaw (though of course it can be! E.g. buffer overruns linked to the undefined behaviors in the area of pointer arithmetic and dereferencing.)
Replacing one undefined behavior with another, only for the purpose of avoiding undefined behavior, is pointless.
Undefined behavior is just a special term used in ISO C to denote things that are outside of the scope of ISO C's definition. It does not mean "not defined by anyone in the world" and doesn't imply something is defective.
Relying on some undefined behaviors is necessary for making most real-world, useful programs, because many extensions are provided through undefined behavior, including platform-specific headers and functions.
Undefined behavior can be supplanted by definitions of behavior from outside of ISO C. For instance the POSIX.1 (IEEE 1003.1) series of standards defines the behavior of including <unistd.h>. An undefined ISO C program can be a well defined POSIX C program.
Some problems cannot be solved in C without relying on some kind of undefined behavior. An example of this is a program that wants to seek so many bytes backwards from the end of a file.
References:
Dan Pop in comp.std.c, Dec. 2002: http://groups.google.com/group/comp.std.c/msg/534ab15a7bc4e27e?dmode=source
Chris Torek, comp.std.c, on the subject of nonstandard functions being undefined behavior, Feb. 2002: http://groups.google.com/group/comp.lang.c/msg/2fddb081336543f1?dmode=source
Chris Engebretson, comp.lang.c, April 1997: http://groups.google.com/group/comp.lang.c/msg/3a3812dbcf31de24?dmode=source
Ben Pfaff, comp.lang.c, Dec 1998 [Jestful answer citing undefinedness of the inclusion of nonstandard headers]: http://groups.google.com/group/comp.lang.c/msg/73b26e6892a1ba4f?dmode=source
Lawrence Kirby, comp.lang.c, Sep 1998 [Explains effects of nonstandard headers]: http://groups.google.com/group/comp.lang.c/msg/c85a519fc63bd388?dmode=source
Christian Bau, comp.lang.c, Sep 1997 [Explains how the undefined behavior of #include <pascal.h> can bring in a pascal keyword for linkage.] http://groups.google.com/group/comp.lang.c/msg/e2762cfa9888d5c6?dmode=source

Does C varargs use a keyword called 'end'?

I have a lot of code that uses C style variable arguments. The code passes in a variable called end at the very end of our variable length function calls. And.... the code also has an enumerator called end. So far they haven't clashed (compiler error says it has an ambiguous definition: It won't tell me where the mysterious second 'end' is defined) until I changed to the VC 10.0 compiler (VS 2010).
So is end some sort of reserved keyword used especially in variable args?
I know very little about them. But I've looked at tons of documentation on variable arguments, as well as searching here, and found nothing (which could be a good thing). So I would guess the answer is that end is not a special word used with varargs. Can I get someone to confirm this?
No -- C doesn't define end as having any special meaning with varargs. When you write a function that takes a variable argument list, it's up to you to decide how to tell it how long of a list has been passed. Some popular ones are that the first argument specifies (at least indirectly) how many more arguments there are, and passing a "sentinel" value (e.g., NULL) after all the others. For a couple of examples, printf does the former, execl the latter.
Once upon a long time ago (7th Edition Unix, for example), there were three external symbols defined: etext, edata and end. These corresponded to the upper address of the code, the initialized data and the heap. It may be that your definition of end is colliding with that, somehow.

Resources