Is it possible to programmatically alter a ARMv7-compiled binary to replace all the new opcodes and instructions with ARMv6 compatible ones?
I don't really care that much about performance at this point, I just want to use some ARMv7 only binaries on an ARMv6 (with vfp, if that matters).
vfp will matter if using instructions not supported on ARMv6 unless those are the ones you are replacing.
If we are talking just arm instructions not armv6 it is probably a short list. It is probably reducing the number of instructions though so you would have to modify the code such that the armv7 instruction causes a branch somewhere, that somewhere is the replacment code using armv6 or older instructions, then branch back. not branch and link, unconditional branch or ldr pc,something, etc. if you are talking about thumb2 stuff that may still be possible but
probably more work, some things you might not be able to do.
short answer: Yes in general this kind of thing can be done.
Related
"C is a genereal purpose language, not tied to a particular system"
The C programming Language, BRIAN W KERNIGHAN & DENNIS M. RITCHIE
Yet with the right compiler we can make a .exe which runs on every Windows machine, which in turn means on every CPU Windows runs on.
So my question is: does every x86-64 CPU (Intel or AMD) use the same instruction set ? (yes, I could make a comparison...) if not, then I'll have to assume that the compiler detects what CPU we're running and uses the right instruction set during compile time.
Am I totally mistaken ?
I barely know what I'm talking about so please bear with me.
Just a dude trying to look under the hood.
Thank you
Intel makes many different processor models that share a core instruction set of the “x86-64” family (and additional processor models that do not). Even among the processors with the shared core instructions, there are variations. Newer models may have instructions that older models did not, and some parts of the instruction set may be on certain models and not others.
Some instructions even behave differently on different processors.
When you compile a program, the compiler “targets” a particular combination of instruction subsets. This means the instructions in those subsets are available for the compiler to use when it is generating code. The compiler might or might not use any particular instruction or subset depending on its needs or choices when compiling a particular program. The resulting program is then suitable for processor models with the targeted instructions and not for other models (unless the compiler happened not to use any of the instructions not on those models, even though it could have).
Often, the default setting for the compiler‘s target is either a processor model like the one you are running on or some typical selection of instruction subsets that is common for modern processor models. The target may also be selected based on other settings you give the compiler, such as asking it to target a particular version of an operating system. However, you can pass the compiler switches to tell it to compile for entirely different targets, even for entirely different architectures, such as compiling for an ARM processor while running on an Intel processor.
Software is also part of a computer system, so the executable file the compiler produces may also depend on certain software libraries being available at run-time or certain operating system features being available.
How to determine whether NEON engine exists on given ARM processor? Any status/flag register can be queried for such purpose?
I believe unixsmurf's answer is about as good as you'll get if using an OS with privileged kernel. For general purpose feature detection, it seems ARM has made it a requirement to get this from the OS, and so you must use an OS API to get it.
On Android NDK use #include <cpu-features.h> with (android_getCpuFamily() == ANDROID_CPU_FAMILY_ARM) && (android_getCpuFeatures() & ANDROID_CPU_ARM_FEATURE_NEON). Note this is for 32 bit ARM. ARM 64 bit has different flags but the idea is the same. See the sources/docs.
On Linux, if available use #include <sys/auxv.h> and #include <asm/hwcap.h> with getauxval(AT_HWCAP) & HWCAP_NEON.
On iOS, I'm not sure there is a dynamic call, the methodology seems to be that you build your app targeting NEON, then make sure your app is flagged to require NEON so it will only install on devices which support it. Of course you should use the pre-defined preprocessor flag __ARM_NEON__ to make sure everything is in order at compile time.
On whatever Microsoft does or if you are using some other RTOS... I don't know...
Actually you'll see a lot of Android implementations which just parse /proc/cpuinfo in order to implement android_getCpuFeatures().... Heh. But still it seems to be getting improved and newest versions use the getauxval method.
One reliable way is to check the architectural feature trap register. For example, on ARM Cortex A35, you can check the value of HCPTR register to see whether NEON is implemented (0x000033FF), or not (0x0000BFFF). The register name and indication value are platform dependent, making sure to check the technical reference manual.
Is there anyway to check if Neon and sve is supported?
I have seen someone saying something about the HCPTR register, but it does not seem to have any relationship to neon and besides looks to be a Aarch32 instruction according to the docs
https://developer.arm.com/docs/ddi0595/g/aarch32-system-registers/hcptr
I first came across the ARM instruction set in the 80's, and have not used it since. Out of curiosity I was looking at the the tablets and other ARM devices and note that the CPU's are produced by different manufacturers.
I did a quick search but I couldn't find a definitive statement as whether the different ARM chips have differing instruction sets.
I would assume that in the main they are the same.
goto http://infocenter.arm.com along the left under contents look for ARM architecture. And under that Reference Manuals. used to be there was a single ARM ARM (ARM Architecture Reference Manual) but the family has grown to the point they had to break it into, well, families.
The ARM ARM's are going to show you the instruction sets. What I think they call the ARMv5 manual is the old ARM ARM. You will find the ARM instructions (32bit) and thumb instructions (16 bit). For each instruction they list what architecture supports it, so you might see an ARMv5 instruction that is not supported by the ARMv4 (ARMv4 a.k.a ARM7, like the popular ARM7TDMI core). Thumb instructions are supported by ARMv4T and newer, etc.
So there is the core 32 bit arm instruction set which you may have been used to with new instructions added from time to time and bugs/restrictions fixed (ldr r0,[r0] for example), etc.
The floating point unit has had one or two overhauls, most cores do not have a fpu and the ones that have an fpu that doesnt mean the chip vendor included it in the chip. the fpa being the older, vfp being newer and now neon stuff. If you pay attention these all fall into the generic coprocessor instructions category. But you dont have to know/use the coprocessor version they have aliases for everything.
There is/was this java/jazelle thing, same story some cores might have it as an option doesnt mean the vendor included it.
At least two sets of thumb2 extensions to the thumb instruction set. Before thumb2 extensions the thumb instructions were all 16 bit and had a one to one mapping to an ARM instruction, makes sense you only need an ARM core, the decoder translates from the smaller instruction to ARM instruction and feeds that to the core. All instructions are 16 bit except the branch, and if you look at that pattern you can quite easily decode that as two separate 16 bit instructions. So then they decide to make their microcontroller offering smaller, instead of everyone just using the ARM7TDMI and consuming the chip size and power, thumb2 capable processors are thumb only, they do not support 32 bit ARM instructions, there is no ARM core that thumb instructions are translated to, etc. new core. The ARMv6-M a.k.a Cortex-m0 and Cortex-m1 take the thumb instruction set and add a few 32 bit instructions to close the performance gap to ARM (thumb was smaller yes, but a little slower than ARM if you compiled the same code to both, it took like 10-20% more instructions from my experiments to use thumb). In theory thumb-2 (ARMv7-M) outperforms ARM when and where you can compare them. For whatever reason the Cortex-m3 came out first which is ARMv7-M and has a bunch of 32 bit thumb2 instructions added to the thumb instruction set. I recently counted and ARMv6-M added like 20, ARMv7-M has like 140-150 instructions added to the base thumb instruction set. thumb2 is basically variable word length. And again only runs on the cortex-m series. Looking at it it is almost like they re-built the ARM instruction set again under the name thumb. not completely but you get back a lot of arm like instructions, three register instead of two, being able to reach higher registers and use immediates, etc. What this caused is a desire to write asm that compiled for both ARM and thumb/thumb2. So they came up with a unified syntax. you can write an instruction like
add r0,r1
If assembling for thumb, that is the instruction, if assembling for arm they will convert it to
add r0,r0,r1
for you, instead of any syntax errors. You have to specify that you are using the unified syntax, at least with the gnu binutils assembler (gas).
An equally important set of documents is the Technical Reference Manuals, also at infocenter.arm.com. Each core has a trm, actually each rev of each core has a TRM. Also the extra cost items like L2 caches have their own TRM, for each rev. it is important to find out the core the chip vendor bought/used and if possible the revision (rev 2.0 r2p0, rev 1.0 r1p0, etc) as there are programming differences as well as errata differences between them (dont trust Linux as a reference!, it is a huge mess, every time I look yet another company has completely misunderstood and misapplied core/errata differences, it si a bit of a disaster at the moment). Sometimes the TRM includes instruction information, or paints a more clear picture on what that core supports and doesnt support. The ARM ARM's are generic they cover the whole family or a number of families of cores, where the TRM is very specific to one core. An example of confusing between the ARM ARM and the TRM is that looking at the ARM ARM you might get the impression that you can use BE-32 or BE-8 big endian modes, the reality is you have either one or the other ARMv6 and newer is BE-8, period, get used to it. ARMv5 and ARMv4 is BE-32 or before ARMv6 just called big endian. I highly recommend NOT using big endian on an arm despite what you think you might gain from it. go with the native mode and you will save yourself a ton of work and failure. I mention it from personal experience trying to figure out why the bits described in an ARM ARM just didnt work in the core I was using.
A 64 bit core is somewhere in the development phase, I wouldnt be surprised if it is done and just looking for someone to pull the trigger and use it. Actually the ARMv8 doc is available, downloading now.
Short answer infocenter.arm.com under ARM Architecture you find all the docs describing the different instruction sets as well as improvements/additions over time to those instruction sets.
There is no difference (with respect to the instruction set) between manufacturers.
They all respect the ARM specification.
Some extensions are optional. This is the case with NEON.
But, as far as I know, only the Tegra 2 does not include this extension.
This is why the Tegra 2 is a very bad processor for video decoding (for example).
There are a few common variations of instructions sets, UAL, Thumb and Thumb2 being the most common. Some ARM cores that contain specialized hardware (such as DSPs) extend the language as well.
This used to not be the case. ARM required adherence to their spec. ARM ships IP which of course adheres to their spec but they also require the architecture licensees to adhere to it. However, that changed slightly in 2019 when ARM began to allow custom instructions with their embedded CPUs.
The C language was used to write UNIX to achieve portability -- the same C language program compiled using different compilers produces different machine instructions. How come Windows OS is able to run on both Intel and AMD processors?
AMD and Intel processors(*) have a large set of instructions in common, so it is possible for a compiler or assembler to write binary code which runs "the same" on both.
However, different processor families even from one manufacturer have their own sets of instructions, usually referred to as "extensions" or whatever. Ignoring the x87 co-processor, the first time I remember this being a marketing point was when everything suddenly went "with MMX(TM) technology". Binary code expected to run on any processor either needs to avoid extensions, or to detect the CPU type before using them.
Intel's Itanium 64-bit architecture was completely different from AMD's x86-64 architecture, so for a while their 64bit offerings were non-compatible (and Itanium was nothing like x86, whereas x86-64 extended the instruction set by adding 64bit instructions). Intel blinked first and adopted x86-64, although there are still a few differences: http://en.wikipedia.org/wiki/X86-64#Differences_between_AMD64_and_Intel_64
Windows probably uses the common x86 or x86-64 instruction set for almost all code. I wouldn't be surprised if various drivers and codecs are shipped in multiple versions, and the correct one selected once the CPU has been interrogated.
(*) Actually, Intel make or have made various kinds of processors, including ARM (Intel's ARM processors were called XScale, but I think they've sold that business). And AMD make other processors too. But we know which Intel/AMD processors you mean :-)
AMD are Intel compatible, otherwise they would never have gained a foothold in the market place.
They are effectively clone compatible.
As you suspect, the main stream Intel and AMD processors have the same instruction set.
Windows does not run on ARM or PowerPC chips, for example, because it is somewhat dependant on the underlying instruction set.
However, most of Windows is written in C++ (as far as I know), which should be portable to other architectures. Windows NT even ran on PowerPC and other architectures.
Intel's 80x86 CPUs and AMD's 80x86 are "mostly the same sort of", but some things are completely different (e.g. virtual machine extensions - SVM vs. VT-x) and some things (extensions) may or may not be supported. However, some things are different on different CPUs from the same manufacturer too (e.g. some Intel chips support AVX2 and some don't).
There are multiple ways to deal with the differences:
only use the common subset so the same code runs on all 80x86 CPUs (e.g. treat it like an 8086 chip).
use a subset of features that is common to a range of CPUs so the same code runs on all 80x86 CPUs in that range. This is very common (e.g. "this software requires an 80x86 CPU (and OS) that supports 64-bit extensions").
use install-time tests. For example, there might be 4 different copies of software (compiled for 4 different ranges of CPUs) where the installer decides which copy makes sense for the computer the software is being installed on.
use run-time tests. For example, code can use the CPUID instruction to do if( AVX2_is_supported() ) { set_function_pointers_so_AVX2_is_used(); } else {set_function_pointers_so_AVX2_is_not_used(); }. Note: Some compilers (Intel's ICC) can automatically generate code that does run-time tests.
These aren't mutually exclusive options. For example, the installer might decide to install a 64-bit version (and not a 32-bit version), and then the 64-bit version might check which features are supported at run-time and have different code to use different features.
Also note that different parts of an OS can be treated separately. For example, an OS could have 6 different boot loaders, 4 different "HALs", 4 different kernels, and 3 different "kernel modules" to support virtualisation; where some of these things might do run-time tests and some might not.
Do Intel and AMD processor have the same assembler?
Almost all assemblers for 80x86 support almost all extensions (from all CPU manufacturers - e.g. Intel, AMD, VIA, Cyrix, SiS, ...). In general; it's up to the programmer (or compiler) to make sure they only use things that they know exist. Some assemblers provide features to make this easier (e.g. NASM provides a CPU ... directive so that the programmer can tell the assembler to generate errors if it sees instructions that aren't supported on the specified CPU).
AMD and Intel use the same instruction set.
When you install windows on an AMD processor or an Intel processor, it doesn't "compile" code on the machine.
I remember many people being confused on this subject back during college. They believe that a "setup" means that it is compiling code on your machine. It isn't. Most if not all Windows application outside of the free realms, are given to you by binary.
As for portability, that isn't neccessarily 100% true. While C is highly portable, in many cases writing for a specific OS or system will result in the code only being able to compile/executed on that box. For example, certain Unix machines handle files and directories differently so it might not be 100% portable.
Do Intel and AMD processor have the same assembler?
An assembler assembles a program to be run on a processor, so your question is flawed. Processors DO NOT use assemblers.
If you mean can Intel and AMD processor run the same assembler? Then the answer is YES!!!
All an assemblers are, is a program that assembles other programs from structured text files. Visual Basic is an example of an assembler.
First Question
From a C programmer's point of view, what are the differences between Intel Core processors and their AMD equivalents ?
Related Second Question
I think that there are some instructions that differentiate between the Intel Core from the other processors and vis-versa. How important are those instructions ? Are they being taken into account by compilers ? Would performances be better if there was some special Intel compiler only for the Core family ?
If you are programming user-level code and most driver code, there aren't many differences (one exception is the availability of certain instruction sets - which may differ for different processors, see below). If you are writing kernel code dealing with CPU-specific features (profiling using internal counters, memory management, power management, virtualization), the architectures differ in implementation, sometimes greatly.
Most compilers do not automatically take advantage of SSE instructions. However, most do provide SSE-based intrinsics, which will allow you to write SSE-aware code. The subset of all SSE levels available differs for each processor architecture and maker.
See this page for instruction listings. Follow the links to see which architectures the specific instructions are supported on. Also, read the Intel and AMD architecture development manuals for exact details about support and implementation of any and all instruction sets.
First Question From a C programmer's point of view, what are the differences between
Intel Core processors and their AMD equivalents ?
The most significant differences are likely to show up only in highly specialized code that makes use of new generation instructions, such as vector maths, parallelization, SSE.
Would performances be better if there was some special Intel compiler only for the Core family ?
Not sure if you are aware of it, but there's a compiler specifically for Intel cores: icc. It's generally considered to be the best compiler from an optimization point of view.
You might want to check out its wikipedia article.
According to the Intel Core Wikipedia article, there were notable
improvements to SSE, SSE2, and SSE3 instructions. These instructions are SIMD (same instruction, multiple data), meaning that they are designed for applying a single arithmetic operation to a vector of values. They are certainly important, and have been made used by compilers such as GCC for quite awhile.
Of course, recent AMD processors have adopted the newest Intel instructions, and vice-versa. This is an ongoing trend.