Compile binary data into C program and use them like a file - c

I have a C library which uses a set of binary data files (read only). One of these files, lets call it f1.dat, is used in 99% of applications which use the library, while the other 59 files f2.dat .. f60.dat are used only rarely.
I would like to compile the data of f1.dat directly into the library. The users of the library who never wish to use the data in files f2.dat .. f60.dat would not have to carry an extra data file around, the compiled library .dll or .so would work without extra resources for those users.
The most convenient solution would be if the memory area with the data could be accessed with the same function calls fseek, ftell, read as the data in a file. For the application it should make no difference whether it reads an external fle or this memory "file".
Is there a portable solution for this?

Related

Does a C program load everything to memory?

I've been practicing C today and something came to my mind. Whenever C code is ran, does it load all files needed for execution into memory? Like, does the main.c file and it's header files get copied into memory? What happens if you have a complete C program that takes up 1 GB or something large?
A C program is first compiled into a binary executable so header files, sources files, etc do not exist anymore at this point... unless you compiled your binary with debugging informations (-g flag).
This is a huge topic. Generally the executable is mapped into what's called virtual memory which allows to address more space than you have available in your computer's memory (through paging). When you will try to access code segments that are not yet loaded, it will create a page fault and the os will fetch what's missing. Compilers will often reorder functions to avoid executing code from random memory locations so you're, most of the time, executing only a small part of your binary.
If you look into specific domains such as HPC or embedded devices the loading policies will likely be different.
C is not interpreted but compiled language.
This means that the original *.c source file is never loaded at execution time. Instead, the compiler will process it once, to produce an executable file containing machine language.
Therefore, the size of source file doesn't directly matter. It may totally be very large if it contains a lot of different use cases, then producing a tiny executable because only the applicable case will be picked at compilation time. However, most of the time, the executable size remains correlated with its source, but it doesn't necessarily means that this will end up in something huge.
Also, included *.h headers file at top of C source files are not actually « importing » a dependence (such as use, require, or import would in other languages). #include statement is only here to insert the content of a file at a given point, but these files usually contain only function prototypes, variable declarations and some precompiler #define clauses, which form the API of an external resource that is linked later to your program.
These external resources are typically other object modules (when you have multiple *.c files within a same project and you don't need to recompile them all from scratch at each time), static libraries or dynamic libraries. These later ones are DLL files under Windows and *.so files under Unix. In this case, the operating system will automatically load the required libraries when you run your program.

what's the relationship between dlfcn.c, ld-linux.so and libdl.so?

I'm new to C and linker, sorry if my question sounds weird.
I check online and found dlfcn.c, ld-linux.so are both called dynamic linker, then comes the libdl.so which is dynamic linker library by its name, so what's the relationsip between them?
does dlfcn.c and other essentiaL .C files used to generate ld-linux.so? if yes then what's the difference between ld-linux.so and libdl.so?
ld-linux.so
... is what I call "the dynamic linker":
This file is loaded by the Linux kernel together with an ELF file when the ELF file requires dynamic libraries.
The file ld-linux.so contains the code that loads the dynamic libraries (for example libc.so) needed by the ELF file from the disk to memory.
libdl.so
This file is a dynamic library that contains functions like dlopen() or dlsym():
These functions allow a program to "dynamically" load dynamic libraries - this means the program can call a function to load a dynamic library.
One of many use-cases are plug-ins that the user may configure in some configuration dialog (so these plug-ins do not appear in the list of required files stored inside the executable file).
dlfcn.c
I'm not absolutely sure, but this file seems to be part of the source code of libdl.so.

Having .c source file storing .txt information at compile-time

I'm using C to make some RTEMS application for a given target (a LEON processor more specifically).
When doing the various tutorials I noticed that since it isn't possible to load the simulation .txt files, the solution is to have .c source files (let's call them inputs.c) keeping the various 512x512 global input matrices and have them referenced as extern within the main file.
I'm trying to find information about this procedure but I haven't found it.
My question: In the documentation of the example they state that at some point they are going to transfer the global matrices in the inputs.c from the PC to the target via UART. Isn't the inputs.c file loaded into the LEON processor as well as all the other .c files?
I think there is some information missing to completely understand which is your environment ...
But it could be that the data into the input.c is linked into a separated section (you should check the RTEMS linker file cmdlnk).
This way it won't be loaded by grmon but it will be loaded on specific command.
Or probably you do actually upload the data exactly at the same time of the executable code by doing the "load" in grmon.

Finding file type in Linux programmatically

I am trying to find the file type of a file like .pdf, .doc, .docx etc. but programmatically not using shell command. Actually i have to make an application which blocks access to files of a particular extension. I have already hooked sys_call_table in LKM and now i want that when an open/read system call is triggered then my LKM checks the file type.
I know that we have a current pointer which gives access to current process structure and we can use it to find the file name stored in dentry structure and also in Linux a file type is identified by a magic number stored in starting bytes of file. But i don't know that how to find file type and exactly where it is stored ?
Linux doesn't "store" the file type for its files (unlike Mac OS' resource fork, which I think is the most well-known platform to do this). Files are just named streams of bytes, they have no structure implied by the operating system.
Either you just tell programs which file to use (and then it Does What You Say), or programs use higher-level features to figure it out.
There are programs that re-invent this particular wheel (I'm responsible for one of those), but you can also use e.g. file(1). Of course that requires your program to parse and "understand" the textual output you'll get, which in a sense only moves the problem.
However, I don't think calling into file from kernel space is very wise, so it's probably best to re-create the test for whatever set of types you need, to keep it small.
In other words, I mean you should simply re-implement the required tests. This is quite complicated in general, so if you really need to do it for as a large a set of types as possible, it might not be a very good idea. :/
Actually i have to make an application which blocks access to files of a particular extension.
that's a flawed requirement. If you check by file extension, then you'll miss files that doesn't use the extension which is quite common in Linux since it does not use file extension.
The officially sanctioned way of detecting file type in Linux is by their magic number. The shell command file is basically just a wrapper for libmagic, so you have the option of linking to that library

Execute C program at bootloader level via Assembler

I wrote a custom (VERY basic "Hello world!") bootloader in Assembler and I would like to execute a C program in that. Would the C program work, or fail due to a lost stdio.h file? And how could I bundle the C program along with the bootloader into a single .bin file to dd to a flash drive/CD?
I'm not sure what you mean by "lost stdio.h", but many C runtime functions, including those prototyped in stdio.h, are implemented using system calls. Without an OS running, those system calls won't work.
It is possible to write C code that runs without an OS, for example most common bootloaders have just a tiny amount of assembler and mostly C code. The trick is to avoid using runtime libraries. Alternatives to syscalls, for e.g. display, are BIOS calls and hardware-specific I/O.
To take just one example, in addition to dynamic allocation, fopen in read mode needs the following low-level operations:
Reading a block of data from storage
Reading the file system metadata (often, superblock and root directory)
Processing file system metadata to find out where the file content is stored
Creating a FILE object that contains enough information for fread and fgetc to find the data on disk
You don't have an OS to help with any of that, your C code will need to implement a driver (possibly calling the BIOS) for block read, and implement the behavior of the other steps.

Resources