Having trouble understanding how "readdir.c" works in the Linux Kernel

Having trouble understanding how "readdir.c" works in the Linux Kernel - c

I've heard filldir()/filldir64() lets the kernel specify the "dirent" (directory entry) layout depending on the binary type. Why do differing binary types matter?
I am also confused about dir_context, I assume filldir64() can be defined by other filesystems and implemented differently? So is that why there is an actor member?
filldir64(): https://elixir.bootlin.com/linux/latest/source/fs/readdir.c#L307
dir_context: https://elixir.bootlin.com/linux/latest/source/include/linux/fs.h#L1776

The reason for this is to run old binaries. The ABI of the readdir() system call has changed a few times. The way this works is each system call number calls a different function. There's a common implementation of the filldir loop that dispatches to the filesystem-specific calls to read directory entires, and again dispatches to the appropriate filldir() implementation.
dir_context is defined by the specific implementation of filldir() to retain some state between calls. Desipite being written in C, the filesystem is object-oriented code. When you see stuff like this, start thinking virtual method dispatch, becasue effectively that's what it is. But since it's done in C, the this pointer has to be passed manually as the first argument.

Related

nice() library call or sys call?

Should a function to change the priority of the calling process (eg: nice()) be implemented as a library call or as a system call? I was reading about it online and from what I understood, it used to be a system call but it's a library call now. Why so?

All functions are library calls. The question you're looking at is a sloppy shorthand for the question of whether or not there's a syscall that corresponds directly to the semantics, so that the library function can do something trivial along the lines of return syscall(SYS_foo, ...);.
Often it happens that at some point in history there was a syscall that corresponded fully to the function's operations at that point in history, but either:
new requirements on the function rendered the existing syscall incapable of meeting the function's needs, or
a bug was discovered in the syscall that's fundamental to its interface
In either case, if there's a reasonable way to implement the operation entirely with other syscalls, that usually makes more sense than implementing a "v2" of the syscall. Moreover, even if a new syscall is added, it's often added with further generality than the old interface needed, for extensibility or to be useful for satisfying other existing needs. Therefore it might not correspond directly to the function being implemented, just provide a means to obtain the functionality.

Why was a readdir function added to POSIX library interface when there is a readdir kernel function?

I was surprised to discover the man pages having entries for two conflicting variants of readdir.
in READDIR(2), it specifically states you do not want to use it:
This is not the function you are interested in. Look at readdir(3) for the POSIX conforming C library interface. This page documents the bare kernel system call interface, which is superseded by getdents(2).
I understand a function may become deprecated when another function comes along and does its job better, but I am not familiar with other cases of a userspace function coming in and replacing a kernel function of the same name. Is there a known reason it was chosen to go this route rather than coming up with a new function name (as the man page mentions getdents did when superseding readdir).

The programming interface, POSIX, is stable. You don't just go replacing functions in it unnecessarily because you want to implement the backend more efficiently. The Linux syscall readdir never implemented the readdir function because it has the wrong signature; it was an old, inefficient backend for implementing the readdir function. When a better backend came along, it was obsolete.

You have it completely backwards: it's the library function readdir(3) which predates Linux and its readdir(2) system call, and not the reverse.
Naming the syscall that way was certainly a poor decision, and probably has a story behind it, but it's pretty much irrelevant now, as nobody is using it.
On Unix, directories used to be simple files formatted in a special way, and the system call interface through which they were read was just read(2) [1]. Later systems introduced system calls like getdirentries (44BSD) and getdents (SVR3), but they weren't willing or capable to standardize on an interface, so we're still stuck with the high level and broken [2] readdir(3) library function as the only standard interface for reading a directory.
[1] On some systems like BSD you can still cat a directory, at least when using the default filesystem (FFS).
[2] it's broken because it's not signal safe, and it returns NULL for both error and EOF, which means that the only way it could be safely used is by first setting errno to 0, and checking both its return value and errno afterwards. Yuck.

How to intercept C library calls in windows?

I have a devilish-gui.exe, a devilish.dll and a devilish.h from a C codebase that has been lost.
devilish-gui is still used from the customer and it uses devilish.dll
devilish.h is poorly documented in a 30-pages pdf: it exposes a few C functions that behave in very different ways according to the values in the structs provided as arguments.
Now, I have to use devilish.dll to write a new devilish-webservice. No, I can't rewrite it.
The documentation is almost useless, but since I have devilish-gui.exe I'd like to write a different implementation of the devilish.h so that it log function's call and arguments in a file, and than calls the original dll function. Something similar to what ltrace does on linux, but specialized for this weird library.
How can I write such "intercepting" dll on windows and inject it between devilish.dll and devilish-gui.exe?

A couple of possibilities:
Use Detours.
If you put your implementation of devilish.dll in the same directory as devilish-gui.exe, and move the real implementation of devilish.dll into a subdirectory, Windows will load your implementation instead of the real one. Your implementation can then forward to the real one. I'm assuming that devilish-gui isn't hardened against search path attacks.
Another approach would be to use IntelliTrace to collect a trace log of all the calls into devilish.dll.

Is it possible to LD_PRELOAD a function with different parameters?

Say I replace a function by creating a shared object and using LD_PRELOAD to load it first. Is it possible to have parameters to that function different from the one in original library?
For example, if I replace pthread_mutex_lock, such that instead of parameter pthread_mutex_t it takes pthread_my_mutex_t. Is it possible?
Secondly, besides function, is it possible to change structure declarations using LD_PRELOAD? For example, one may add one more field to a structure.

Although you can arrange to provide your modified pthread_mutex_lock() function, the code will have been compiled to call the standard function. This will lead to problems when the replacement is called with the parameters passed to the standard function. This is a polite way of saying:
Expect it to crash and burn
Any pre-loaded function must implement the same interface — same name, same arguments in, same values out — as the function it replaces. The internals can be implemented as differently as you need, but the interface must be the same.
Similarly with structures. The existing code was compiled to expect one size for the structure, with one specific layout. You might get away with adding an extra field at the end, but the non-substituted code will probably not work correctly. It will allocate space for the original size of structure, not the enhanced structure, etc. It will never access the extra element itself. It probably isn't quite impossible, but you must have designed the program to handle dynamically changing structure sizes, which places severe enough constraints on when you can do it that the answer "you can't" is probably apposite (and is certainly much simpler).
IMNSHO, the LD_PRELOAD mechanism is for dire emergencies (and is a temporary band-aid for a given problem). It is not a mechanism you should plan to use on anything remotely resembling a regular basis.

LD_PRELOAD does one thing, and one thing only. It arranges for a particular DSO file to be at the front of the list that ld.so uses to look up symbols. It has nothing to do with how the code uses a function or data item once found.
Anything you can do with LD_PRELOAD, you can simulate by just linking the replacement library with -l at the front of the list. If, on the other hand, you can't accomplish a task with that -l, you can't do it with LD_PRELOAD.

The effects of what you're describing are conceptually the same as the effects of providing a mismatching external function at normal link time: undefined behavior.
If you want to do this, rather than playing with fire, why don't you make your replacement function also take pthread_mutex_t * as its argument type, and then just convert the pointer to pthread_my_mutex_t * in the function body? Normally this conversion will take place only at the source level anyway; no code should be generated for it.

Some general C questions

I am trying to fully understand the process pro writing code in some language to execution by OS. In my case, the language would be C and the OS would be Windows. So far, I read many different articles, but I am not sure, whether I understand the process right, and I would like to ask you if you know some good articles on some subjects I couldn´t find.
So, what I think I know about C (and basically other languages):
C compiler itself handles only data types, basic math operations, pointers operations, and work with functions. By work with functions I mean how to pass argument to it, and how to get output from function. During compilation, function call is replaced by passing arguments to stack, and than if function is not inline, its call is replaced by some symbol for linker. Linker than find the function definition, and replace the symbol to jump adress to that function (and of course than jump back to program).
If the above is generally true and I get it right, where to final .exe file actually linker saves the functions? After the main() function? And what creates the .exe header? Compiler or Linker?
Now, additional capabilities of C, today known as C standart library is set of functions and the declarations of them, that other programmers wrote to extend and simplify use of C language. But these functions like printf() were (or could be?) written in different language, or assembler. And there comes my next question, can be, for example printf() function be written in pure C without use of assembler?
I know this is quite big question, but I just mostly want to know, wheather I am right or not. And trust me, I read a lots of articles on the web, and I would not ask you, If I could find these infromation together on one place, in one article. Insted I must piece by piece gather informations, so I am not sure if I am right. Thanks.

I think that you're exposed to some information that is less relevant as a beginning C programmer and that might be confusing you - part of the goal of using a higher level language like this is to not have to initially think about how this process works. Over time, however, it is important to understand the process. I think you generally have the right understanding of it.
The C compiler merely takes C code and generates object files that contain machine language. Most of the object file is taken by the content of the functions. A simple function call in C, for example, would be represented in the compiled form as low level operators to push things into the stack, change the instruction pointer, etc.
The C library and any other libraries you would use are already available in this compiled form.
The linker is the thing that combines all the relevant object files, resolves all the dependencies (e.g., one object file calling a function in the standard library), and then creates the executable.
As for the language libraries are written in: Think of every function as a black box. As long as the black box has a standard interface (the C calling convention; that is, it takes arguments in a certain way, returns values in a certain way, etc.), how it is written internally doesn't matter. Most typically, the functions would be written in C or directly in assembly. By the time they make it into an object file (or as a compiled library), it doesn't really matter how they were initially created, what matters is that they are now in the compiled machine form.
The format of an executable depends on the operating system, but much of the body of the executable in windows is very similar to that of the object files. Imagine as if someone merged together all the object files and then added some glue. The glue does loading related stuff and then invokes the main(). When I was a kid, for example, people got a kick out of "changing the glue" to add another function before the main() that would display a splash screen with their name.
One thing to note, though is that regardless of the language you use, eventually you have to make use of operating system services. For example, to display stuff on the screen, to manage processes, etc. Most operating systems have an API that is also callable in a similar way, but its contents are not included in your EXE. For example, when you run your browser, it is an executable, but at some point there is a call to the Windows API to create a window or to load a font. If this was part of your EXE, your EXE would be huge. So even in your executable, there are "missing references". Usually, these are addressed at load time or run time, depending on the operating system.

I am a new user and this system does not allow me to post more than one link. To get around that restriction, I have posted some idea at my blog http://zhinkaas.blogspot.com/2010/04/how-does-c-program-work.html. It took me some time to get all links, but in totality, those should get you started.

The compiler is responsible for translating all your functions written in C into assembly, which it saves in the object file (DLL or EXE, for example). So, if you write a .c file that has a main function and a few other function, the compiler will translate all of those into assembly and save them together in the EXE file. Then, when you run the file, the loader (which is part of the OS) knows to start running the main function first. Otherwise, the main function is just like any other function for the compiler.
The linker is responsible for resolving any references between functions and variables in one object file with the references in other files. For example, if you call printf(), since you do not define the function printf() yourself, the linker is responsible for making sure that the call to printf() goes to the right system library where printf() is defined. This is done at compile-time.
printf() is indeed be written in pure C. What it does is call a system call in the OS which knows how to actually send characters to the standard output (like a window terminal). When you call printf() in your program, at compile time, the linker is responsible for linking your call to the printf() function in the standard C libraries. When the function is passed at run-time, printf() formats the arguments properly and then calls the appropriate OS system call to actually display the characters.