How to get x86 Linux kernel definitions - c

Let's say I want to find the real definition of O_APPEND, which is one of the flags for open() syscall. By real definition, I mean a number like 0x2.
I go to LXR and do a search:
Defined in 5 files:
arch/alpha/include/uapi/asm/fcntl.h, line 10 (as a macro)
arch/mips/include/uapi/asm/fcntl.h, line 13 (as a macro)
arch/parisc/include/uapi/asm/fcntl.h, line 4 (as a macro)
arch/sparc/include/uapi/asm/fcntl.h, line 4 (as a macro)
include/uapi/asm-generic/fcntl.h, line 35 (as a macro)
However, there's no x86 here. Why?
I faced the same problem when I needed to look up the numbers for syscalls and I ended up using a 3rd-party website that contains a generated table.
If I understood correctly, syscalls are somehow generated on-the-fly, so there's no possibility to look them up until the kernel is preprocessed for a specific architecture.
Is it the same story for all the defines for x86 and x86_64? How do I continue when I need something that is not already on the Internet generated by someone? I could've looked it up on my desktop in headers, but I use x86_64, not x86.
So, how do I find the exact numbers flags and modes are #defined to for x86 architechture?

In the case of the user space API (uapi), it will be defined in include/uapi/asm-generic/fcntl.h.
The generic part of the path means this is architecture independent code.

Related

Get start and end of a section

Foreword
There already exist questions like this one, but the ones I have found so far were either specific to a given toolchain (this solution works with GCC, but not with Clang), or specific to a given format (this one is specific to Mach-O). I tagged ELF in this question merely out of familiarity, but I'm trying to figure out as "portable" a solution as reasonably possible, for other platforms. Anything that can be done with GCC and GCC-compatible toolchains (like MinGW and Clang) would suffice for me.
Problem
I have an ELF section containing a collection of relocatable "gadgets" that are to be copied/injected into something that can execute them. They are completely relocatable in that the raw bytes of the section can be copied verbatim (e.g. by memcpy) to any [correctly aligned] memory location, with little to no relocation processing (done by the injector). The problem is that I don't have a portable way of determining the size of such a section. I could "cheat" a little by using --section-start, and outright determine a start address, but that feels a little hacky, and I still wouldn't have a way to get the section's end/size.
Overview
Most of the gadgets in the section are written in assembly, so the section itself is declared there. No languages were tagged because I'm trying to be portable with this and get it working for various architectures/platforms. I have separate assembly sources for each (e.g. ARM, AArch64, x86_64, etc).
; The injector won't be running this code, so no need to be executable.
; Relocations (if any) will be done by the injector.
.section gadgets, "a", #progbits
...
The more "heavy duty" code is written in C and compiled in via a section attribute.
__attribute__((section("gadgets")))
void do_totally_innocent_things();
Alternatives
I technically don't need to make use of sections like this at all. I could instead figure out the ends of each function in the gadget, and then copy those however I like. I figured using a section would be a more straightforward way to go about it, to keep everything in one modular relocatable bundle.
I'm not sure if you considered this or this is out of picture but you could read the elf headers.
This would be sort of 'universal' as you can do the same thing with Mach-O binaries
So for example:
Creating 3 integer variables inside the 'custom_sect' section
These would add up to 12 or 0xC bytes which if we read the headers we can confirm:
Size with readelf
and here is how a section is represented in the ELF executables: ELF section representation
So each section will have its own size property which you can just read out

Linking additional code for the microcontroller (AVR) against already existing code

The problem definition:
There is a need to have two parts of the code in an AVR microcontroller, a fixed one that is always there and does not change (often), and a transient one, that is (not so) often to be replaced or appended. The challenge is to give the ability for the transient code to call functions and access global variables of the fixed one -- and vice versa.
It is quite obvious that there should be special methods for the fixed code to access transient one -- like having calculated function pointers in RAM and using only them to call transient code procedures.
For the calling in backwards direction, I was thinking about linking transient code against existing .elf file of the fixed code.
I'm using avr-gcc toolchain (as in ubuntu 20.20), gcc version 5.4.0
What's I've already tried:
adding '-shared' as a link argument when building fixed code -- appears to be unsupported for AVR (linker reports an error).
adding instead '-Wl,--export-dynamic' as a link argument -- it seems to be ignored, no .dynsym section appears in the elf.
There is still a .symtab section in the fixed code elf -- could that be somehow used to link against it?
Note: my division of 'fixed' and 'transient' code has nothing to do with boot-area of some AVR microcontroller, boot is just something I do not care here about.
Note2: The question is much alike this one, but gives clear explanation for the need.
You have to forget all big computer knowledge. 8 bits AVRs are timy microcontrollers. Code has to be linked statically. There is no other way.

How is the type sf_count_t in sndfile.h defined in libsndfile?

I am trying to work with Nyquist (a music programming platform, see: https://www.cs.cmu.edu/~music/nyquist/ or https://www.audacityteam.org/about/nyquist/) as a standalone program and it utilizes libsndfile (a library for reading and writing sound, see: http://www.mega-nerd.com/libsndfile/). I am doing this on an i686 GNU/Linux machine (Gentoo).
After successful set up and launching the program without errors, I tried to generate sound via one of the examples, "(play (osc 60))", and was met with this error:
*** Fatal error : sizeof (off_t) != sizeof (sf_count_t)
*** This means that libsndfile was not configured correctly.
Investigating this further (and emailing the author) has proved somewhat helpful, but the solution is still far from my grasp. The author recommended looking at /usr/include/sndfile.h to see how sf_count_t is defined, and (this portion of) my file is identical to his:
/* The following typedef is system specific and is defined when libsndfile is
** compiled. sf_count_t will be a 64 bit value when the underlying OS allows
** 64 bit file offsets.
** On windows, we need to allow the same header file to be compiler by both GCC
** and the Microsoft compiler.
*/
#if (defined (_MSCVER) || defined (_MSC_VER))
typedef __int64 sf_count_t ;
#define SF_COUNT_MAX 0x7fffffffffffffffi64
#else
typedef int64_t sf_count_t ;
#define SF_COUNT_MAX 0x7FFFFFFFFFFFFFFFLL
#endif
In the above the author notes there is no option for a "32 bit offset". I'm not sure how I would proceed. Here is the particular file the author of Nyquist recommend I investigate: https://github.com/erikd/libsndfile/blob/master/src/sndfile.h.in , and here is the entire source tree: https://github.com/erikd/libsndfile
Here are some relevant snippets from the authors email reply:
"I'm guessing sf_count_t must be showing up as 32-bit and you want
libsndfile to use 64-bit file offsets. I use nyquist/nylsf which is a
local copy of libsndfile sources -- it's more work keeping them up to
date (and so they probably aren't) but it's a lot easier to build and
test when you have a consistent library."
"I use CMake and nyquist/CMakeLists.txt to build nyquist."
"It may be that one 32-bit machines, the default sf_count_t is 32
bits, but I don't think Nyquist supports this option."
And here is the source code for Nyquist: http://svn.code.sf.net/p/nyquist/code/trunk/nyquist/
This problem is difficult for me to solve because it's composed of an niche use case of relatively obscure software. This also makes the support outlook for the problem a bit worrisome. I know a little C++, but I am far from confident in my ability to solve this. Thanks for reading and happy holidays to all. If you have any suggestions, even in terms of formatting or editing, please do not hesitate!
If you look at the sources for the bundled libsndfile in nyquist, i.e. nylsf, then you see that sndfile.h is provided directly. It defines sf_count_t as a 64-bit integer.
The libsndfile sources however do not have this file, rather they have a sndfile.h.in. This is an input file for autoconf, which is a tool that will generate the proper header file from this template. It has currently the following definition for sf_count_t for linux systems (and had it since a while):
typedef #TYPEOF_SF_COUNT_T# sf_count_t ;
The #TYPEOF_SF_COUNT_T# would be replaced by autoconf to generate a header with a working type for sf_count_t for the system that is going to be build for. The header file provided by nyquist is therefore already configured (presumably for the system of the author).
off_t is a type specified by the POSIX standard and defined in the system's libc. Its size on a system using the GNU C library is 32bit if the system is 32bit.
This causes the sanity check in question to fail, because the sizes of sf_count_t and off_t don't match. The error message is also correct, as we are using an unfittingly configured sndfile.h for the build.
As I see it you have the following options:
Ask the nyquist author to provide the unconfigured sndfile.h.in and to use autoconf to configure this file at build time.
Do not use the bundled libsndfile and link against the system's one. (This requires some knowledge and work to change the build scripts and header files, maybe additional unexpected issues)
If you are using the GNU C library (glibc): The preprocessor macro _FILE_OFFSET_BITS can be set to 64 to force the size of off_t and the rest of the file interface to use the 64bit versions even on 32bit systems.
This may or may not work depending on whether your system supports it and it is not a clean solution as there may be additional misconfiguration of libsndfile going unnoticed. This flag could also introduce other interface changes that the code relies on, causing further build or runtime errors/vulnerabilities.
Nonetheless, I think the syntax for cmake would be to add:
add_compile_definitions(_FILE_OFFSET_BITS=64)
or depending on cmake version:
add_definitions(-D_FILE_OFFSET_BITS=64)
in the appropriate CMakeLists.txt.
Actually the README in nyquist/nylsf explains how the files for it were generated. You may try to obtain the source code of the same libsndfile version it is based on and repeat the steps given to produce an nylsf configured to your system. It may cause less further problems than 2. and 3. because there wouldn't be any version/interface changes introduced.

What does __section( ) mean in linux kernel source

I see the following code in some OS kernel. But I don't understand the way __section is used, and don't know what does this code mean.
#define KEEP_PAGER(sym) \
extern const unsigned long ____keep_pager_##sym; \
const unsigned long ____keep_pager_##sym \
__section("__keep_meta_vars_pager") = (unsigned long)&sym
It's specific linux kernel C macro definition wrapped around a GCC extension, specifying an atttribute to use for an object. It's a shorter way of writing the section attribute definition
Historically the linux kernel has been written specifically for building with the GCC compiler, and makes extensive use of low level extensions to do specific hardware operations and optimisations.
The section attribute specifically is used to determine the storage location of the object tagged with it. ELF binary format arranges the object file into named sections, and using the attribute like this allows the programmer to more precisely specify where the information for the tagged object will be placed in the target object
Over the years, there's been work put in to increasing the compatibility of these compiler extensions between different compilers, as well as making linux compilable with alternative compilers (if you look at the linux header file where the macro is defined you'll see that it is full of conditional directives for various compiler features). Macros like this can be a useful way to have a portable internal API for low level features across different compiler implementations.
Kernel and kernel driver C code is atypically concerned with direct specifics of physical hardware implementation, and needs to be explicit about the compiler binary output in a way that application level C code rarely will.
One example of why the linux kernel uses named sections is in the init handling - functions and data that are only used during bootup are grouped into one section of memory that can be easily released once startup is complete - you may be familiar with the boot message along the lines of 'freeing unused kernel memory:...' towards the end of the linux boot sequence
It is hard to tell what that __section is exactly without its definition, but it might be a variable "section" attribute. It is used to make compiler place variable into a section different from "bss" or "data". See GCC documentation for details.

Is there a reason even my tiniest .c files always compile to at least 128-kilobyte executables?

I am using Dev-C++, which compiles using GCC, on Windows 8.1, 64-bit.
I noticed that all my .c files always compiled to at least 128-kilobyte .exe files, no matter how small the source is. Even a simple "Hello, world!" was 128kb. Source files with more lines of code increased the size of the executable as I would expect, but all the files started off at at least 128kb, as if that's some sort of minimum size.
I know .exe's don't actually have a minimum size like that; .kkrieger is a full first-person shooter with 3d graphics and sound that all fit inside a single 96kb executable.
Trying to get to the bottom of this, I opened up my hello_world.exe in Notepad++. Perhaps my compiler adds a lengthy header that happens to be 128kb, I thought.
Unfortunately, I don't know enough about executables to be able to make sense of it, though I did find strings like "Address %p has no image-section VirtualQuery failed for %d bytes at address %p" buried among the usual garble of characters in an .exe.
Of course, this isn't a serious problem, but I'd like to know why it's happening.
Why is this 128kb minimum happening? Does it have something to do with my 64-bit OS, or perhaps with a quirk of my compiler?
Short answer: it depends.
Long answer: it depends on what operating system you have and how it handles executables.
Most (if not all) compilers of programming languages do not break it down to the absolute, raw x86/ARM/other architecture's machine code. Instead, after they pack your source code into a .o (object) file, they then bring the .o and its libraries and "link" it all together, in such a way that it forms a standard executable format. These "executable formats" are essentially system-specific file formats that contain low level, very-close-to-machine-code instructions that the OS interprets in such a way that it can relay those low-level instructions to the CPU in the form of machine-code instructions.
For example, I'll talk about the two most commonly used executable formats for Linux devices: ELF and ELF64 (I'll let you figure out what the namesake differences are yourself). ELF stands for Executable and Linkable Format. In every ELF-compiled program, the file starts off with a 4-byte "magic number", which is simply a hexadecimal 0x7F followed by the string "ELF" in ASCII. The next byte is set to either 1 or 2, which signifies that the program is for 32-bit or 64-bit architectures, respectively. And after that, another byte to signify the program's endianness. After that, there's a few more bytes that tell what the architecture is, and so on, until you reach a total of up to 64 bytes for the 64-bit header.
However, 64 bytes is not even close to the 128K that you have stated. That's because (aside from the fact that the windows .exe format is usually much more complex), there is the C++ standard library at fault here. For instance, let's have a look at a common use of the C++ iostream library:
#include <iostream>
int main()
{
std::cout<<"Hello, World!"<<std::endl;
return 0;
}
This program may compile to an extremely large executable on a windows system, because the moment you add iostream to your program, it adds the entire C++ standard library into it, increasing your executable's size immensely.
So, how do we rectify this problem? Simple:
Use the C standard library implementation for C++!
#include <cstdio>
int main()
{
printf("Hello, World!\n");
return 0;
}
Simply using the original C standard library can decrease your size from a couple hundred KBytes to a handful at most. The reason that this happens is simply because GCC/G++ really likes linking programs with the entire standard C++ library for some odd reason.
However, sometimes you absolutely need to use the C++-specific libraries. In that case,a lot of linkers have some kind of command-line option that essentially tells the linker "Hey, I'm only using like, 2 functions from the STDCPP library, you don't need the whole thing". On the Linux linker ld, this is the command-line option -nodefaultlibs. I'm not entirely sure what this is on windows, though. Of course, this can very quickly break a TON of calls and such in programs that make a lot of standard C++ calls.
So, in the end, I would worry more about simply re-writing your program to use the regular C functions instead of the new-fangled C++ functions, as amazing as they are. that is if you're worried about size.

Resources