What is PIC code on shared libraries? - linker

This is and excerpt from The GNU
Multiple Precision
Arithmetic Library Manual
On some CPUs, in particular the x86s, the static libgmp.a should be
used for maximum speed, since the PIC code in the shared libgmp.so
will have a small overhead on each function call and global data
address. For many programs this will be insignificant, but for long
calculations there’s a gain to be had.
In this context, what does PIC code mean?

PIC stands for Position Independent Code.
The word code in "PIC code" is redundant.

Related

Low level languages and their dependencies

I am trying to understand exactly what it means that low-level languages are machine-dependent.
Let's take for example C, well if it is machine-dependent does it mean that if it was compiled on one computer it might not be able to run on another?
In the end processors executes machine code which is basicly a collection of binary numbers. The processor decode each binary number to figure out what it is supposed to do. One binary number could mean "Add register X to register Y and store the result in register Z". Another binary number could mean "Store the content of register X into the memory address held by register Y". And so on...
The complete description of these decoding rules (i.e. binary number into operation) represents the processors instruction set (aka ISA).
A low level language is a language where the code you can write maps very closely to the specific processors instruction set. Assembly is one obvious example. Since different processor may have different instruction sets, it's clear that an assembly program written for one processors ISA can't be used on a processor with a different ISA.
Let's take for example C, well if it is machine-dependent does it mean that if it was compiled on one computer it might not be able to run on another?
Correct. A program compiled for one processor (family) can't run on another processor with (completely) different ISA. The program needs to be recompiled.
Also notice that the target OS also plays a role. If you use the same processor but use different OS you'll also need to recompile.
There are at least 3 different kind of languages.
A languages that is so close to the target systems ISA that the source code can only be used on that specific target. Example: Assembly
A language that allows you to write code that can be used on many different targets using a target specific compilation. Example: C
A language that allows you to write code that can be used on many different targets without a target specific compilation. These still require some kind of target specific runtime environment to be installed. Example: Java.
High-Level languages are portable, meaning every architecture can run high-level programs but, compared to low-level programs (like written in Assembly or even machine code), they are less efficient and consume more memory.
Low-level programs are known as "closer to the hardware" and so they are optimized for a certain type of hardware architecture/processor, being faster programs, but relatively machine-dependant or not-very-portable.
So, a program compiled for a type of processor it's not valid for other types; it needs to be recompiled.
In the before
When the first processors came out, there was no programming language whatsoever, you had a very long and very complicated documentation with a list of "opcodes": the code you had to put into memory for a given operation to be executed in your processor. To create a program, you had to put a long string of number in memory, and hope everything worked as documented.
Later came Assembly languages. The point wasn't really to make algorithms easier to implement or to make the program readable by any human without any experience on the specific processor model you were working with, it was created to save you from spending days and days looking up things in a documentation. For this reason, there isn't "an assembly language" but thousands of them, one per instruction set (which, at the time, basically meant one per CPU model)
At this point in time, all languages were platform-dependent. If you decided to switch CPUs, you'd have to rewrite a significant portion (if not all) of your code. Recognizing that as a bit of a problem, someone created a the first platform-independent language (according to this SE question it was FORTRAN in 1954) that could be compiled to run on any CPU architecture as long as someone made a compiler for it.
Fast forward a bit and C was invented. C is a platform-independent programming language, in the sense that any C program (as long as it conforms with the standard) can be compiled to run on any CPU (as long as this CPU has a C compiler). Once a C program has been compiled, the resulting file is a platform-dependent binary and will only be able to run on the architecture it was compiled for.
C is platform-dependent
There's an issue though: a processor is more than just a list of opcodes. Most processors have hardware control devices like watchdogs or timers that can be completely different from one architecture to another, even the way to talk to other devices can change completely. As such, if you want to actually run a program on a CPU, you have to include things that make it platform-dependent.
A real life example of this is the Linux kernel. The majority of the kernel is written in C but there's still around 1% written in different kinds of assembly. This assembly is required to do things such as initialize the CPU or use timers. Using this hack means Linux can run on your desktop x86_64 CPU, your ARM Android phone or a RISCV SoC but adding any new architecture isn't as simple as just "compile it with your architecture's compiler".
So... Did I just say the only way to run a platform-independent on an actual processor is to use platform-dependent code? Yes, for most architectures, you have to.
Or is it?
But there's a catch! That's only true if you want to run you code on bare metal (meaning: without an OS). One of the great things of using an OS is how abstracted everything is: you don't need to know how the kernel initializes the CPU, nor do you need to know how it gets its clock, you just need to know how to access those abstracted resources.
But the way of accessing resources dependent on the OS, aren't we back to square one? We could be, if not for the standard library! This library is used to access functions like printf in a defined way. It doesn't matter if you're working on a Linux running on PowerPC or on an ARM Windows, printf will always print things on the standard output the same way.
If you write standard C using only the standard library (and intend for your program to run in an OS) C is completely platform-independent!
EDIT: As said in the comments below, even that is not enough. It doesn't really have anything to do with specific CPUs but some things such as the system function or the size of some types are documented as implementation-defined. To make C really platform independent you need to make sure to only use well defined functions of the STL and learn some best practice (never rely on sizeof(int)==4 for instance).
Thinking about 'what's a program' might help you understand your question. Is a program a collection of text (that you've typed in or otherwise manufactured) or is it something you run? Is it both?
In the case of a 'low-level' language like C I'd say that the text is the program source, and that this is turned into a program (aka executable) by a compiler. A program is something you can run. You need a C compiler for a system to be able to make the program source into a program for that system. Once built the program can only be run on systems close to the one it was compiled for. However there is a more interesting, if more difficult question: can you at least keep the program source the same, so that all you need to do is recompile? The answer to this is 'sort-of No' I sort-of think. For example you can't, in pure C, read the state of the shift key. Of course operating systems provide such facilities and you can interface to those in C, but then such code depends on the OS. There might be libraries (eg the curses library) that provide such facilities for many OS and that can help to reduce the dependency, but no library can clain to portably cover all OS.
In the case of a 'higher-level' language like python I'd say the text is both the program and the program source. There is no separate compilation stage with such languages, but you do need an interpreter on a system to be able to run your python program on that system. However that this is happening may not be clear to the user as you may well seem to be able to run your python 'program' just by naming it like you run your C programs. But this, most likely comes down to the shell (the part of the OS that deals with commands) knowing about python programs and invoking the interpreter for you. It can appear then that you can run your python program anywhere but in fact what you can do is pass the program to any python interpreter.
In the zoo of programming there are not only many, very varied beasts, but new kinds of beasts arise all the time, and old beasts metamorphose. Terms like 'program', 'script' and even 'executable' are often used loosely.

What does __latent_entropy is used for in C

Please I would like to understand in which case do we use the keyword __latent_entropy in a C function signature.
I saw some google results talking about a GCC plugin, but I don't still understand what is its impact.
Thanks
You can have a look at the Kconfig's description of what enabling latent_entropy GCC plugin does (it also has a mention of its impact in Linux' performance):
config GCC_PLUGIN_LATENT_ENTROPY
bool "Generate some entropy during boot and runtime"
help
By saying Y here the kernel will instrument some kernel code to
extract some entropy from both original and artificially created
program state. This will help especially embedded systems where
there is little 'natural' source of entropy normally. The cost
is some slowdown of the boot process (about 0.5%) and fork and
irq processing.
Note that entropy extracted this way is not cryptographically
secure!
This plugin was ported from grsecurity/PaX. More information at:
* https://grsecurity.net/
* https://pax.grsecurity.net/
Here you'll find a more detailed description of the latent_entropy GCC plugin. Some content taken from the link:
...
this is where the new gcc plugin comes in: we can instrument the kernel's
boot code to do some hash-like computation and extract some entropy from
whatever program state we decide to mix into that computation. a similar
idea has in fact been implemented by Larry Highsmith of Subreption fame
in http://www.phrack.org/issues.html?issue=66&id=15 where he (manually)
instrumented the kernel's boot code to extract entropy from a few kernel
variables such as time (jiffies) and context switch counts.
the latent entropy plugin takes this extraction to a whole new level. first,
we define a new global variable that we mix into the kernel's entropy pools
on each initcall. second, each initcall function (and all other boot-only
functions they call) gets instrumented to compute a 'random' number that
gets mixed into this global variable at the end of the function (you can
think of it as an artificially created return value that each instrumented
function computes for our purposes). the computation is a mix of add/xor/rol
(the happy recovery Halvar mix :) with compile-time chosen random constants
and the sequence of these operations follows the instrumented functions's
control flow graph. for the rest of the gory details see the source code ;).
...

Why some kernel actions cannot be written in C

I apologize if this should sound trivial and unsubtle but I couldn't figure out an intuitive way to google it, Why are some kernel actions like saving the current state of the registers and the stack(just to mention a few) written in Assembly? Why can't they be written in C because after all, presumably, when compilation is done, all we get is object code? Besides when you use ollydbg, you notice that before a function call(in C), the current state of the register is pushed to the stack
When writing an OS the main goal is to maintain the highest abstraction to make the code reusable on different architectures, but at the end inevitably there is the architecture.
Each machine performs the very low level functions in such a specialized way that no general programming language can sustain.
Task switching, bus control, device interrupt handling, just to name few, cannot be coded efficiently using an high level language (consider instruction sequences, involved registers, and eventual critical CPU timings and priority levels).
On the other hand, it is not even convenient to use mixed programming, i.e. inline assembler, because the crafted module will be no more abstract, containing specific architecture code that can't be reused.
The common solution is to write all code following the highest abstraction level, reducing to a few modules the specialized code. These routines, fully written in assembly, are normally well defined in terms of supplied input and expected output, so the programmer can produce same results on different architectures.
Compiling for different CPU is then done by simply switching the set of assembly routines.
C does not assure you that it modifies the registers you need to modify.
C just implements a logic you write in your code and the interpretation given by the language will be as you expect, hiding completely the details behind the interpretation.
If you want a kind of logic like set the X register with a given value or move data from register X to register Y, as it's necessary to do in kernel sometimes, this kind of logic is not defined by the C language.
C is a generic high level language, not specific to one target. But at the kernel level there are things that you need to do that are target specific that the C language simply cannot do. Enabling an interrupt or configuring an MMU or configuring something to do with the protection mechanism. On some targets these items and others are configured using registers in the address space but on some targets specific assembly language instructions are required and so C cannot be used, it has to be assembly. There is usually at least one thing you have to use assembly for per target if not many.
Sometimes it is a simple case of wanting the correct instruction to be used for example a 32 bit store must be used for some operation to insure that and not hope the compiler gets it right then use asm.
There is no C equivalent for "return from exception". The C compiler can't translate everything that assembly can do. For instance, if you write an operating system you will need a special return function in the interrupt service routine that goes back to where the interrupt was initiated and the C compiler can't translate such a functionality, it can only be expressed in assembly.
See also Is assembly strictly required to make the "lowest" part of an operating system?
Context switching is critical and need to be really really fast which should not be written in high level language.

Does libjit dynamically translate a piece of code to something executable?

Is GNU libjit meant to translate a piece of code into something executable (say, machine code for x86) at run time? I don't see how the examples from the libjit tutorial actually shows this. Any ideas? Thanks.
Not as a single step. It's probably more appropriate to say that libjit can generate executable code in memory, if given a low-level description of what that code should do. That's what the calls to functions like jit_insn_add() in the tutorial are all about; libjit converts a sequence of notional instructions (e.g, "add register 1 to register 2", "perform the next block of instructions if register 3 is zero") fed to it into a sequence of bytes in memory which can be run by your CPU to perform those operations.
If you want to convert a textual representation of some code (e.g, the string a = b + c;) to executable code of some variety, that's an entirely different and unrelated task. A full explanation of everything involved is beyond the scope of an answer on this site, but a general study of formal compiler implementation would be my recommended starting point. (Ignore for a moment that you intend to execute this code at runtime, rather than compiling it to an executable; this has surprisingly little bearing on the techniques used.) An excellent textbook on the subject is "Compilers: Principles, Techniques, and Tools", aka. the "Dragon Book".

Porting from PLM51 to C

I am doing a project where I need to port code from PLM51 to C.
8051 architecture is being used. The microcontroller is romless and an external memory of 64Kb is being used. The PLM51 code size is almost 63Kb.
So my question is that when I port my code from PLM51 to C, will the code size increase or decrease?
What are the parameters which will decide the increase/decrease in size?
To start out I must say that while I have written in both languages, I have not done a port from PL/M to C or compared the sizes of similar programs written in the two languages.
This question is very difficult to answer with any degree of certainty but the two languages are fairly similar in their level, being fairly low level portable languages. I seem to remember our rule of thumb for PL/M was an average of around 5 assembler instructions per PL/M statement. This efficiency will vary between compilers and optimisation levels.
One factor that may have a large impact on the code size of the final image is the external libraries that may be included by the linker. A particular culprit is the printf formatter that is typically quite large. In PL/M you would normally write your own character output functions that would be tailored to your specific needs often resulting is smaller code.

Resources