Sending arguments to ftw() - c

Is there a way to send arguments to ftw() to be used in process each file/directory on the path? It's a bit difficult to have the argument concerned as a global variable due to multithreading issues, i.e having the value as global will be visible to all threads and that would be wrong.

A properly designed C callback interface has a void* argument that you can use to pass arbitrary data from the surrounding code into the callback. [n]ftw does not have such an argument, so you're kinda up a creek.
If your compiler supports thread-local variables (the __thread storage specifier) you can use them instead of globals; this will work but is not really that much tidier than globals.
If your C library has the fts family of functions, use those instead. They are available on most modern Unixes (including Linux, OSX, and recent *BSD) and gnulib has a fallback implementation.

Related

ABI of functions in system libraries

I'm generating machine code to call functions from existing system libraries. Most system libraries were written in C, so I'll take C as an example, but the question probably applies to any other language.
If I understand this answer correctly, C compilers are free to choose the ABI/calling convention of a function as long as they preserve the semantics. For instance they can choose to pass a pointer for the returned value as an argument to obtain copy-elision.
Does this mean that no one can ever truly know what's the right way to call a function from a library, even if its C signature is known?
Is this a real concern in practice? Or is it safe to assume that all the functions with non-mangled names from system libraries always use the system's default calling convention?
What other assumptions or considerations can I make about the ABI/calling convention of functions with non-mangled names in system libraries?
C compilers are free to choose the ABI/calling convention of a function as long as they preserve the semantics.
Well, yes and no. The ABI is often defined by the target system, in which case the compiler has to fall in line. In case there exists no ABI for the target system (often the case in microcontroller programming), the compiler is free to do as it pleases, essentially inventing the ABI.
Does this mean that no one can ever truly know what's the right way to call a function from a library, even if its C signature is known?
No you can't unless you know the target system and calling convention. Some systems have several "de facto" standards such as x86 Windows __cdecl vs __stdcall see https://en.wikipedia.org/wiki/X86_calling_conventions
Is this a real concern in practice?
Not within a program written entirely in C. But it becomes a big problem in case the program links external libs such as Windows DLLs, possibly written in other languages. Then you have to use the right calling convention or the program will soon crash.
It's also a very real concern whenever you attempt to mix assembler and C for the given system - the C compiler will handle stacking according to the calling convention, but in the assembler part you have to write this manually. This can also affect the C code, if it is written with care to suit assembler. You'd then pick parameter and return types that are convenient to use.
If I understand this answer correctly, C compilers are free to choose
the ABI/calling convention of a function as long as they preserve the
semantics. For instance they can choose to pass a pointer for the
returned value as an argument to obtain copy-elision.
I don't see how you conclude that from the answer you referenced. Calling conventions are a characteristic of the function, as it appears in compiled form. The compiler can do all manner of tricks at the point of call, but changing or ignoring the calling conventions of the function implementation is not one of them. Where it is possible, copy elision for returned structure values (the subject of that answer) does not rely on any such thing.
Does this mean that no one can ever truly know what's the right way to
call a function from a library, even if its C signature is known?
Yes and no. The function signature alone does not convey anything about calling convention (with some caveats; see below), but libraries simply could not work if there were no way to know calling conventions. In practice, it is usually the case that calling convention (and ABI overall) is standardized on a per-platform basis.
Thus, for example, Linux implementations for x86_64 substantially all follow the same conventions. All the toolchains targeting that platform both use that convention for function calls and provide for functions to be called according to it. Compilers for Win64 likewise follow the appropriate (different) conventions.
Windows is in fact an interesting case, however, because historically, it has supported multiple calling conventions. In its case, there is a default convention, and different conventions can be specified in function declarations via extension keywords. The compiler knows which convention to use based on the function declaration.
Additionally, where it is not concerned about interoperability, compilers can do anything within their power. So, for example, when compiling a function with internal linkage, it could, in principle, use whatever calling convention it wants, as it is in full control of both the function and all callers (ignoring the possible effect of function pointers). This is not different in kind from compilers' ability to inline functions. As a practical matter, however, I would not expect compilers to use variant calling conventions under such circumstances, and I am not aware of any that do.
Is this a real concern in practice? Or is it safe to assume that all
the functions with non-mangled names from system libraries always use
the system's default calling convention?
Name mangling has nothing to do with it. That's part of a higher-level mapping of C++ (usually) semantics onto system-level, source-language-independent object-file formats.
Generally speaking, it is safe to assume that where the appropriate function declarations are in scope (from the library's header files, typically), the compiler will generate correct calls. This is an essential interoperability characteristic that is rarely violated in practice. It cannot be construed as a universal guarantee, but in practice, it is not something that you should worry about.
What other assumptions or considerations can I make about the
ABI/calling convention of functions with non-mangled names in system
libraries?
I'm unsure what kinds of assumptions you have in mind, and I suspect you're overcomplicating things. You make sure to include the header(s) from the relevant library that declare the functions you want to call. Having done so, you rely on your compiler to generate correct calls.

Providing external routines from a C library in a threadsafe manner

I have a c-library wrapped around a fortran library that I want to use in OCaml. The obvious solution is to map the c-interface into ocaml routines using some handwritten code to deal with GC.
However, it turns out that the algorithm implemented by the fortran library gets its inputs as EXTERNAL routines, i.e.:
EXTERNAL RHS
This means that the input is essentially passed by the linker. The C-wrapper has a nice interface collecting all required input in a struct, but essentially provides one global instance of that struct and then defines all the missing external routines in terms of that global instance.
As a functional programmer, this smells like an antipattern to me. Since I do not want to rewrite the fortran code, my question is:
Is there a safe, idiomatic way to link the fortran library and avoiding global state clashups? Can the C-library provide the global state of the fortran library, without rewriting the fortran code?
If no such way exists, what is a good C11 (i.e. OS independent) idiom to protect the global state? I'd need a kind of global lock that only allows access through a key that is issued exactly once.
I just read about thread local declarations in C11, would that be an option?

Are there any pitfalls when passing function pointers between compilation units?

I ask because i am using a PIC microcontroller to asynchronously operate hardware and implementing function pointers as a callback mechanism would be of benefit.
An example would be whereby an i2C library accepts read and write 'jobs' and sequentially executes each 'job' as the hardware resource becomes available (and as the user ticks the i2C software state machine). Depending on the implementers use of the i2C library, they may wish to manipulate the data prior to returning it, (bitmasking, setting flags etc) this is where i'm thinking of adding an i2C callback mechanism.
The user would pass a job, which includes a callback function pointing to the calling compilation unit. Is this allowed? and are there any cases that i need to be careful of if it is allowed?
Passing pointers between compilation units is done all the time. For example, free() in the standard library is certainly compiled separately and yet takes a pointer as its argument.
Within many projects, including the Linux kernel, callbacks between compilation units are used often.
The main key is to use common header files for defining shared variables, making function definitions, and such. If you define a function using a long pointer, but call it using a declaration that specifies a char pointer, you're entering Undefined Behavior territory.
Also watch out for compiler flags that may change variables sizes, default packing, and such.

Is getpwnam_r() reentrant a requirement?

getpwnam_r() is reentrant according a number of manpages. However, the standard only state
The getpwnam_r() function is thread-safe and returns values in a user-supplied buffer instead of possibly using a static data area that may be overwritten by each call.
I am confused. Must a NSS Module's ...getpwnam_r() function reentrant? Or just thread-safe is enough?
Well, as you note the standard requires that the function must be thread-safe. That doesn't prevent an implementation from providing a stricter guarantee.
IOW, portable software cannot assume that getpwnam_r is reentrant. But, if you care only about some specific platform which guarantees that it's reentrant, then presumably you can assume that.

Using __thread in c99

I would like to define a few variables as thread-specific using the __thread storage class. But three questions make me hesitate:
Is it really standard in c99? Or more to the point, how good is the compiler support?
Will the variables be initialised in every thread?
Do non-multi threaded programs treat them as plain-old-globals?
To answer your specific questions:
No, it is not part of C99. You will not find it mentioned anywhere in the n1256.pdf (C99+TC1/2/3) or the original C99 standard.
Yes, __thread variables start out with their initialized value in every new thread.
From a standpoint of program behavior, thread-local storage class variables behave pretty much the same as plain globals in non-multi-threaded programs. However, they do incur a bit more runtime cost (memory and startup time), and there can be issues with limits on the size and number of thread-local variables. All this is rather complicated and varies depending on whether your program is static- or dynamic-linked and whether the variables reside in the main program or a shared library...
Outside of implementing C/POSIX (e.g. errno, etc.), thread-local storage class is actually not very useful, in my opinion. It's pretty much a crutch for avoiding cleanly passing around the necessary state in the form of a context pointer or similar. You might think it could be useful for getting around broken interfaces like qsort that don't take a context pointer, but unfortunately there is no guarantee that qsort will call the comparison function in the same thread that called qsort. It might break the job down and run it in multiple threads. Same goes for most other interfaces where this sort of workaround would be possible.
You probably want to read this:
http://www.akkadia.org/drepper/tls.pdf
1) MSVC doesn't support C99. GCC does and other compilers attempt GCC compatibility.
edit A breakdown of compiler support for __thread is available here:
http://chtekk.longitekk.com/index.php?/archives/2011/02/C8.html
2) Only C++ supports an initializer and it must be constant.
3) Non-multi-threaded applications are single-threaded applications.

Resources