I'd like to write some C code be able to query processor attributes on PowerPC, much like one can do with cpuid on x86. I'm after things like brand, model, stepping, SIMD width, available operations, so that there can be run-time confirmation that the code is being used on a compatible platform before something blows up.
Is there a general mechanism for doing this on PowerPC? If so, where can one read about it?
Note that PowerPC has not dozens of extensions / features like x86. It is required to read specific privileged registers that may depend on cores.
I checked on Linux and you can access PVR, there is a trap in the kernel to manage that.
Reading /proc/cpuinfo can return if Altivec is supported, the memory and L2 cache size ... but that is not really convenient.
A better solution is described here:
http://www.freehackers.org/thomas/2011/05/13/how-to-detect-altivec-availability-on-linuxppc-at-runtime/
That uses the content of /proc/self/auxv that provides "the ELF interpreter information passed to the process at exec time".
The example is about Altivec but you can get other features (listed in include "asm/cputable.h"): 32 or 64 bit cpu, Altivec, SPE, FPU, MMU, 4xx MAC, ...
Last, you will find information on caches (size, line size, associativity, ...), look at files in:
/sys/devices/system/cpu/cpu0/cache
PowerPC doesn't have an analogue to the CPUID instruction. The closest you can get is to read the PVR (processor version register). This is a supervisor-privileged SPR, though. However, some operating systems, FreeBSD for example, will trap and execute that for user space processes.
The PVR is read-only, and should be unique for any given processor model and revision. Given this, you can ascertain what features are provided by a given CPU.
Related
I am trying to understand exactly what it means that low-level languages are machine-dependent.
Let's take for example C, well if it is machine-dependent does it mean that if it was compiled on one computer it might not be able to run on another?
In the end processors executes machine code which is basicly a collection of binary numbers. The processor decode each binary number to figure out what it is supposed to do. One binary number could mean "Add register X to register Y and store the result in register Z". Another binary number could mean "Store the content of register X into the memory address held by register Y". And so on...
The complete description of these decoding rules (i.e. binary number into operation) represents the processors instruction set (aka ISA).
A low level language is a language where the code you can write maps very closely to the specific processors instruction set. Assembly is one obvious example. Since different processor may have different instruction sets, it's clear that an assembly program written for one processors ISA can't be used on a processor with a different ISA.
Let's take for example C, well if it is machine-dependent does it mean that if it was compiled on one computer it might not be able to run on another?
Correct. A program compiled for one processor (family) can't run on another processor with (completely) different ISA. The program needs to be recompiled.
Also notice that the target OS also plays a role. If you use the same processor but use different OS you'll also need to recompile.
There are at least 3 different kind of languages.
A languages that is so close to the target systems ISA that the source code can only be used on that specific target. Example: Assembly
A language that allows you to write code that can be used on many different targets using a target specific compilation. Example: C
A language that allows you to write code that can be used on many different targets without a target specific compilation. These still require some kind of target specific runtime environment to be installed. Example: Java.
High-Level languages are portable, meaning every architecture can run high-level programs but, compared to low-level programs (like written in Assembly or even machine code), they are less efficient and consume more memory.
Low-level programs are known as "closer to the hardware" and so they are optimized for a certain type of hardware architecture/processor, being faster programs, but relatively machine-dependant or not-very-portable.
So, a program compiled for a type of processor it's not valid for other types; it needs to be recompiled.
In the before
When the first processors came out, there was no programming language whatsoever, you had a very long and very complicated documentation with a list of "opcodes": the code you had to put into memory for a given operation to be executed in your processor. To create a program, you had to put a long string of number in memory, and hope everything worked as documented.
Later came Assembly languages. The point wasn't really to make algorithms easier to implement or to make the program readable by any human without any experience on the specific processor model you were working with, it was created to save you from spending days and days looking up things in a documentation. For this reason, there isn't "an assembly language" but thousands of them, one per instruction set (which, at the time, basically meant one per CPU model)
At this point in time, all languages were platform-dependent. If you decided to switch CPUs, you'd have to rewrite a significant portion (if not all) of your code. Recognizing that as a bit of a problem, someone created a the first platform-independent language (according to this SE question it was FORTRAN in 1954) that could be compiled to run on any CPU architecture as long as someone made a compiler for it.
Fast forward a bit and C was invented. C is a platform-independent programming language, in the sense that any C program (as long as it conforms with the standard) can be compiled to run on any CPU (as long as this CPU has a C compiler). Once a C program has been compiled, the resulting file is a platform-dependent binary and will only be able to run on the architecture it was compiled for.
C is platform-dependent
There's an issue though: a processor is more than just a list of opcodes. Most processors have hardware control devices like watchdogs or timers that can be completely different from one architecture to another, even the way to talk to other devices can change completely. As such, if you want to actually run a program on a CPU, you have to include things that make it platform-dependent.
A real life example of this is the Linux kernel. The majority of the kernel is written in C but there's still around 1% written in different kinds of assembly. This assembly is required to do things such as initialize the CPU or use timers. Using this hack means Linux can run on your desktop x86_64 CPU, your ARM Android phone or a RISCV SoC but adding any new architecture isn't as simple as just "compile it with your architecture's compiler".
So... Did I just say the only way to run a platform-independent on an actual processor is to use platform-dependent code? Yes, for most architectures, you have to.
Or is it?
But there's a catch! That's only true if you want to run you code on bare metal (meaning: without an OS). One of the great things of using an OS is how abstracted everything is: you don't need to know how the kernel initializes the CPU, nor do you need to know how it gets its clock, you just need to know how to access those abstracted resources.
But the way of accessing resources dependent on the OS, aren't we back to square one? We could be, if not for the standard library! This library is used to access functions like printf in a defined way. It doesn't matter if you're working on a Linux running on PowerPC or on an ARM Windows, printf will always print things on the standard output the same way.
If you write standard C using only the standard library (and intend for your program to run in an OS) C is completely platform-independent!
EDIT: As said in the comments below, even that is not enough. It doesn't really have anything to do with specific CPUs but some things such as the system function or the size of some types are documented as implementation-defined. To make C really platform independent you need to make sure to only use well defined functions of the STL and learn some best practice (never rely on sizeof(int)==4 for instance).
Thinking about 'what's a program' might help you understand your question. Is a program a collection of text (that you've typed in or otherwise manufactured) or is it something you run? Is it both?
In the case of a 'low-level' language like C I'd say that the text is the program source, and that this is turned into a program (aka executable) by a compiler. A program is something you can run. You need a C compiler for a system to be able to make the program source into a program for that system. Once built the program can only be run on systems close to the one it was compiled for. However there is a more interesting, if more difficult question: can you at least keep the program source the same, so that all you need to do is recompile? The answer to this is 'sort-of No' I sort-of think. For example you can't, in pure C, read the state of the shift key. Of course operating systems provide such facilities and you can interface to those in C, but then such code depends on the OS. There might be libraries (eg the curses library) that provide such facilities for many OS and that can help to reduce the dependency, but no library can clain to portably cover all OS.
In the case of a 'higher-level' language like python I'd say the text is both the program and the program source. There is no separate compilation stage with such languages, but you do need an interpreter on a system to be able to run your python program on that system. However that this is happening may not be clear to the user as you may well seem to be able to run your python 'program' just by naming it like you run your C programs. But this, most likely comes down to the shell (the part of the OS that deals with commands) knowing about python programs and invoking the interpreter for you. It can appear then that you can run your python program anywhere but in fact what you can do is pass the program to any python interpreter.
In the zoo of programming there are not only many, very varied beasts, but new kinds of beasts arise all the time, and old beasts metamorphose. Terms like 'program', 'script' and even 'executable' are often used loosely.
I want to know which code and files in the glibc library are responsible for generating traps for floating point exceptions when traps are enabled.
Currently, GCC for RISC-V does not trap floating point exceptions. I am interested in adding this feature. So, I was looking at how this functionality is implemented in GCC for x86.
I am aware that we can trap signals as described in this [question]
(Trapping floating-point overflow in C) but I want to know more details about how it works.
I went through files in glibc/math which according to me are in some form responsible for generating traps like
fenv.h
feenablxcpt.c
fegetexpect.c
feupdateenv.c
and many other files starting with fe.
All these files are also present in glibc for RISC-V. I am not able to
figure out how glibc for x86 is able to generate traps.
These traps are usually generated by the hardware itself, at the instruction set architecture (ISA) level. In particular on x86-64.
I want to know which code and files in the glibc library are responsible for generating traps for floating point exceptions when traps are enabled.
So there are no such file. However, the operating system kernel (notably with signal(7)-s on Linux...) is translating traps to something else.
Please read Operating Systems: Three Easy Pieces for more. And study the x86-64 instruction set in details.
A more familiar example is the integer division by zero. On most hardware, that produces a machine trap (or machine exception), handled by the kernel. On some hardware (IIRC, PowerPC), its gives -1 as a result and sets some bit in a status register. Further machine code could test that bit. I believe that the GCC compiler would, in some cases and with some optimizations disabled, generate such a test after every division. But it is not required to do that.
The C language (read n1570, which practically is the C11 standard) has defined the notion of undefined behavior to handle such situations the most quickly and simply possible. Read Lattner's What every C programmer should know about undefined behavior blog.
Since you mention RISC-V, read about the RISC philosophy of the previous century, and be aware that designing out-of-order and super-scalar processors requires a lot of engineering efforts. My guess is that if you invest as much R&D (that means tens of billions of US$ or €) as Intel -or, to a lesser extent, AMD- did on x86-64 into a RISC-V chip, you could get comparable performance to current x86-64 processors. Notice that SPARC or PowerPC (or perhaps ARM) chips are RISC-like, and their best processors are nearly comparable in performance to Intel chips but got probably ten times less R&D investment than what Intel put in its microprocessors.
Beforehand: This is just a nasty idea I had this night :-)
Think about the following scenario:
You have some arm-elf executable and for some reasons you want to run it on your amd64 box without emulating.
To simplify the scenario, let's say we just want to deal with simple console applications which are just linked against libc and there are no additional architecture specific requirements.
If you want to transform binaries between different architectures you have to consider the following points:
Endianess of the architectures
bit-width of registers
functionality of different registers
Endianess should be one of the lesser problems.
If the bit-width of the destination registers is smaller then those of the source architecture one could insert additional instructions to represent the same behaviour. The same applies to the functionality of registers.
Finally, (and before bashing down this idea), have a look at the following simple code snippet and its corresponding disassembly of the objects.
C Code
Corresponding ARM Disassembly
Corresponding AMD64 Disassembly
In my opinion it should be possible to convert those objects between different architectures. Even function calls (like printf) could be mapped or wrapped to the destination architecture's libc.
And now my questions:
Did anyone already think about realising this?
Is it actually possible?
Are there already some projects dealing with this issue?
Thanks in advance!
As per my knowledge msse and msse2 option of gcc will improve the performance by performing arithmetic operation faster. And also I read some where like it will use more resources like registers, cache memory.
What about the performance if we use the executable generated with these options on RTOS devices(like vxworks board) ?
The OS must support SSE(2) instructions for your application to work correctly. It would seem, from googling, that VcWorks supports this (and it's not really that hard, all it takes is that the OS has a 512 byte save-area per task that uses SSE/SSE2 - given the right circumstances, it can be allocated on demand, but it's often easier to just allocate it to all tasks]. Saving/restoring SSE registers is done "on demand", that is, only when a task different from the previous one to use SSE is using SSE instructions, is it necessary to save the registers. The OS will use a special interrupt(trap) to indicate that "a new task is trying to use SSE instructions.
So, as long as the processor supports it, you should be fine.
I may not be able to directly answer your question, but here are a couple things I do know that may be of use:
SSE, SSE2, etc. must be supported/implemented by the processor for them to have any affect in the first place.
There are specific functions you can call that use these extended instructions for mathematical operations. These functions operate on wider data types or perform an operation on a set efficiently.
Enabling the options in GCC may use the previous APIs/builtins automatically. This is the part I am unsure about.
I have an old code need compiled with -m486 flag in GCC.
But there is no that flag. Then I found -mtune=i486 and -arch=i486
I have read this page.
But still don't know which is the best one for -m486?
The -march option defines the list of instructions that may be used, the -mtune option modifies the optimization process afterwards.
You would normally use -march to specify the minimum requirements, and -mtune to optimize for what the majority of users have.
For example, the IA32 architecture defines various instructions for string handling and repetition of instructions. On the 386 and 486, these are faster and smaller than explicit assembler code because the instruction fetch and decode stages can be skipped, while on newer models, these instructions clog up the instruction pipeline as each processing step is immediately dependent on the previous, so the CPU's parallel execution functionality goes to waste.
Linux distributions typically use -march=i486 -mtune=i686 to ensure that you can still install and run on a 486, but as the majority of users have modern CPUs, the focus is on making it run optimally for these.